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(SI) Abstract 

The present invention relates to a method for the comparative 
assessment of the level of specific nucleic acid sequences in 
samples derived from different sources. More specifically, the 
invention relates to a method using oligonucleotides covalently 
linked to a solid support, such as beads, to isolate specific labeled 
nucleic acid sequences from complex mixtures. Tht methods 
disclosed allow quantitative comparisons of the amount of nucleic 
acid of defined sequence in a plurality of different samples of 
nucleic acid. e.g., from different cells or tissues or from genetic 
libraries. Nucleic acids from the samples are labeled in such 
a fashion that the signals can be distinguished and compared 
following hybridization to the oligonucleotides on the beads. 
According to the invention, the solid supports with the hybridized 
nucleic acid may be retrieved, and the target nucleic acid eluted 
and a.na!>'zed. Fuitherr^o^, t^f". invpntion nrovidas a method for 
xnr^ir.z individual clones fiQin ?. cDNA Itbmn' ,such that they can 
hf* fdfnt'fie<t uniquely and reiiieved by liybridization to specific 
beaua. 
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METHODS FOR MEASURING RELATIVE AMOUNTS OF 
NUCLEIC ACIDS IN A COMPLEX MIXTURE AND 
RETRIEVAL OF SPECIFIC SEOUENCES THEREFROM 

5 !. FIELD OF THE INVENTION 

The present invention relates generally to methods and compositions for the 
quantitation and isolation of specific nucleic acids from complex mixtures of nucleic 
acids. The methods of the invention allow for the comparative assessment of the • 
expression levels of genes in samples derived fi*om different sources, e.g., different 
10 tissue or cell types, disease- or development stages. The invention also relates to 
sorting large populations of nucleic acids based on quantitative measures of 
abundance in such a manner that the nucleic acids can be retrieved for subsequent 
molecular biological experiments. 

Ti. HACKGROV^NB OF TFE ?NVENT50N 
1 5 Differential Gene Expression, The pathology of many diseases involves 

^'£X'^^^„ ^if.^r^f^^A ♦Irr-^*^ f-r^^ *-*<+ot^ 
lie uiSLUtfc^uiSliou by ihc typCS of aCiiVe gCuCS and thCiF CXprCSSiGH IcVCls. Fcr 

example, cancer cells evolve from normal cells to highly invasive, metastatic 
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malignancies, which frequently are induced by activation of oncogenes, or 
inactivation of tumor suppressor genes. See, The National Cancer Institute, "The 
Nation's Investment In Cancer Research: A Budget Proposal For Fiscal Years 
1997/98", Prepared by the Director, National Cancer Institute, pp. 55-77. Altered 

5 expression patterns of oncogenes and tumor suppressor genes in turn effect 

dramatic changes in the expression profiles of numerous other genes. Diflferentially 
expressed sequences can serve as markers of the transformed state and are, 
therefore, of potential value in the diagnosis and classification of tumors. 
Differences in gene expression, which are not the cause but rather the effect of 

10 transformation, may be used as markers for the tumor stage. Thus, the assessment 
of the expression profiles of known tumor-associated genes has the potential to 
provide meaningfiil information with respect to tumor type and stage, treatment 
methods, and prognosis. Furthermore, new tumor-associated genes may be 
identified by systemically comparing the expression of genes in tumor specimens 

1 5 with their expression in control tissue. Genes whose levels are increased in tumors 
relative to normal cells are candidates for genes encoding growth-promoting 
products, e,g., oncogenes. In contrast, genes whose expression is reduced in 
tumors are candidates for genes encoding growth inhibiting products, e.g., tumor 
suppressor genes or genes encoding apoptosis-inducing products. Generally, the 

20 underlying premise is that the profiles of gene expression may point to the 
physiological function or malfiinction of the gene product in the organism. 

Pathological gene expression differences are not confined to cancer. 
Autoimmune disorders, restenosis, atherosclerosis, neurodegenerative diseases, and 

^ ♦u^^- — ^ ™^/>/*t/*/^ frt ^mfrtlv'* qK^^rrant pvnrec^jio^ of ^PTttCular RGnCS. 

Q*, :r:-,_^* . U'>ir/% K/^/>r\ in rf^nf>nf ^/parQ to ir5t*nfifv *MH\ l-SOlate 

genes relevant to these diseases. Accordingly, an efficient method allowing the 
coTnnr*rative assessment of the relative amounts of nucleic acids in complex 
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mixtures, and the retrieval of specific nucleic acids firom those complex mixtures, 
would be an extremely valuable tool for genetic and medical research. 

In the past, the comparison of the expression levels of specific transcripts 
among different cell or tissue types, tissues or cells derived fi-om different disease or 

5 developmental stages, or fi-om cells exposed to different stimuli has provided 
meaningfiil information v^th respect to a gene's fimction or its role in the 
development of a disease. Approaches based on the determination of differences in 
the expression profiles of genes have facilitated the identification of novel genes 
encoding products having a fiinction of interest. For example, such approaches 

1 0 have permitted the identification of several genes, for example T cell receptor genes 
(Yanagi et al, 1984, Nature 308: 145-149), and a number of tumor suppressor 
genes, including/?27 (el-Deiry e/a/., 1993, Ce// 75:817-825; Nodae^ a/., 1994, 
Exp. Cell Res. 211:90-98). Further, comparative assessment of relative amounts of 
nucleic acids has the potential to provide a valuable parameter for the organization 

1 5 of sequence information obtained through large scale sequencing approaches. 

Genetics, Methods that permit the rapid enrichment and subsequent 
identification of sequences that cause specific changes in cell behavior are highly 
desirable. With these methods, specific fiinctions may be assigned to genes or gene 
fragments based on their activity in cells. Traditional genetics involves isolation of 

20 mutants that have particular phenotypes. In combination with modem molecular 
methods, it is possible to isolate the mutant genes responsible for a specific 
phenotype. See, e.g., Kamb et al., 1987, Cell 5fi:405-410. In general, however, the 
process of positional gene cloning, /.e., cloning a gene based on its genetic location, 
It is also 'possible to clone genes by expression For exa^^ple, several 

25 cncccenes have been identified based on their ability/ to c^use ce'l proliferation 
when introduced into cells, Der et aL, 1982, Proc. Natl. Acad Set. U.S.A. 
79:3637-3640; Pradae/a/., 1982 ^a/^/re 297:4 /4-478. It is especially valuable to 
use methods that can not only identity sequences that enhance ceii proliferation, but 
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also identify sequences that inhibit cell growth. Even more valuable, are methods 
that can identify such sequences that have effects specific to certain cell types (e.g., 
a sequence that inhibits growth of tumor cells but not normal cells). The method 
described herein is capable of achieving such results. 
5 Differences In Genomic DNA, Differences in genomic DN A are the 

underlying basis for differences between species and for much of the individual 
variation within a species. Furthermore, many pathological disorders, i.e., genetic 
disorders, are driven by chromosomal mutations. Rowley, 1990, Cancer Res. 
50:3816-3825. Identification of differences in the genome and understanding of 
10 their effect on the phenotype of the organism provides valuable insight into the 
development of inherited diseases. 

Many methods have been used to characterize variation between different 
DNA samples. These involve crude methods of analysis such as overall DNA base 
composition, melting curves, solution hybridization at different stringencies, and 
1 5 measurements of percentages of modified bases and genome size. Progressively 
more refined methods have been applied over the years including restriction 
mapping and DNA sequence analysis. Botstein ei al, 1980, Am. J. Hum. Genet. 
32:314-33 1 ; Lipshutz et aL, 1995, Biotechniques 19:442-447, Ultimately, the 
DNA sequence gives the most detailed and reliable information. However, 
20 sequencing, as a systematic approach for genomic analysis, is slow and expensive. 
Indeed, genomic sequencing has been limited to a few particularly interesting genes 
or genetic intervals. 

Thus, there is an unmet need for an efficient method that allows direct 
screening of Meuumic DNA to detect differences in DNA sequence, ploidy (copy 

Current Means For The Quantitative Determination Of Relative 
Amciints Of ^perifir Nucleic Acids. The technical hurdles associated with the 
quantitative determination of relative arnoimts of nucleic acids, e.g.. the 
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determination of mRNA profiles or the determination of sequence ploidy, are 
daunting. Often, only a few copies of a particular nucleic acid may be present 
within complex mixtures. For example, many transcripts are present only at a very 
low abundance. Thus, a highly sensitive method is required to detect as little as one 

5 mRNA molecule per cell. In the case of genomic DNA, it might be desired to 
detect deletions or amplifications against a background of 3 x 10^ base pairs in the 
human genome. Furthermore, the availability of sample mRNA/cDNA/genomic 
DNA may be rather limited. Thus, the absolute number of nucleic acid molecules in 
a sample may be very small. Moreover, the expression levels of genes vary greatly, 

10 ranging from a single mRNA molecule per cell up to about 5,000 mRNA molecules 
per cell. Given 10,000 different mRNA types per cell on average, and a total of 
500,000 mRNA molecules per cell, the required detection range is tremendous. 
Additionally, the level of each specific nucleic acid molecule (mRNA, cDNA, 
genomic DNA fragment) must be determined separately with a corresponding 

1 5 specific probe, which may be labor- and resource-intensive. 

To date, a number of general methods have been developed to quantify 
nucleic acid molecules. Many of the available methods are suited to assess presence 
or absence, or relative amounts of specific nucleic acids, in particular mRNA, 
expressed in different cell or tissue types. However, each of these methods has 

20 problems, especially when it is an objective to analyze large numbers of targets and 
the available amounts of sample nucleic acids are a limiting factor. 

A traditional method for the assessment of mRNA expression profiles is 
Northern blot analysis. Crude RNA or mRNA derived from different sources is 
separated by gel electrophoresis, and tratT>,sferred to a nitrocellulose or nylon filter. 

7,5 Immobilised on the filter, the mRNA is hybndi/ed witU h probe corresponding to 
sequences of the gene of interest. See, Sambrook et al, 1990, Molecular Ciomng: 
A I ^horotory Manual. Cold Spring Harbour Laboratory Press, New York. 
Norther" blot analysis is a highly sensitive approach for detetttiiniiig the expicssion 
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profile of small numbers of sequences of interest. However, this type of assay is not 
suited for analysis of large numbers of probes. 

A second approach for the determination of mRNA expression profiles 
based on identification of differentially expressed sequences employs DNA probe 

5 hybridization to filters. Palazzolo ei aL, 1989, Neuron 3:527-539; Tavtigian et ai, 
1994, Mol Biol Cell 5:375-388. In this method, phage or plasmid DNA libraries, 
typically cDNA libraries, are plated at high density on duplicate filters. The two 
filter sets are screened independently with cDNA prepared from two sources. The 
signal intensities of the various individual clones are compared between the two 

1 0 duplicate filter sets to determine which clones hybridize preferentially to cDN A 
from one source compared to the other. These clones are isolated and tested to 
verify that they represent sequences that are preferentially present in one of the two 
original samples. The major drawback with this approach is its lack of sensitivity. 
It is typically impossible to identify differentially expressed sequences that are 

1 5 present in amounts of less than one (1) occurrence in as much as 1,000 to 10,000 
sequences. In addition, for detection there must be a relative large disparity in 
expression of a particular sequence. 

A third approach involves the screening of cDNA libraries derived from 
subtracted mRNA populations. Hedrick era/., 1984, A^a/wre 308:149-153. The 

20 method is closely related to the method of differential hybridization described 
above, but the cDNA library is prepared so as to favor clones from one mRNA 
sample over another. This is typically accomplished by a subtractive step prior to 
cloning in which the first strand of the cDNA from the first sample is hybridized to 
an excess of uiRNA fiom the second sample, v;hcrcby the DNA^RNA 

23 htiierodupiexes di e ■ euiovcd The reii^aiiiinB single stranded cDNA is converted 
into double-stranded cDNA and cloned into a phage or plasmid vector. The 
ciiHtrpnt^^H lihrpn; SO ^j^n^T^x^A is depleted for sequences that are shared between 
of mRNA and enriched for fbose that are uniquely present m tne 
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first sample. Clones from the subtracted library can be characterized directly. 
Alternatively, they can be screened by a subtracted cDNA probe, or on duplicate 
filters using two diflFerent probes as above. The advantage of this method is that the 
number of clones which need to be screened and analyzed is small. However, 
5 differential hybridization is technically very diflBcult. Furthermore, it lacks 

sensitivity, and is only suited for identification of differentially expressed sequences 
that are present in relative amounts higher than about one in Ix 10\ 

A fourth approach involves Expressed Sequence Tag (EST) sequencing, 
Lennon et al, 1996, Genomics 33:151-152. This method involves the direct 
1 0 analysis of individual clones from cDNA libraries by DN A sequencing. Libraries 
are generated from two sources that are the objects of comparison, and individual 
inserts of the libraries are sequenced. The frequency of particular sequences 
reflecting the relative abundance of specific sequences is recorded for each library. 
The most significant drawback of EST sequencing is its extreme time and resource 
1 5 ineflficiency. In order to provide a reasonable sampling of each library, many 
thousands of individual insert sequences must be analyzed. 

A fifth approach is Serial Analysis of Gene Expression (SAGE). Velculescu 
et ai, 1995, Science 270:484-487. SAGE is closely related to the above method of 
EST sequencing. However, the libraries are constructed in such a way that small 
20 portions of many individual cDNAs are ligated together in tandem in a single 
vector. This has, compared to the EST approach, the advantage that multiple 
cDNAs are analyzed with each sequencing run which greatly reduces the amount of 
sequencing that must be carried out to achieve a similar level of completeness. 

^IflCe a SlietCii Oi luu^iuy a uuZvn iiuwivonu^a *d jt*A4.*wAwx*t. . 

run can sample up to about fifty transcripts, rather than a single transcript as in the 
EST seque"cir>s method Nevertheless, the process is largely senal and necessitates 
f.r.^rs\\r,r, rvf q!1 r-T^XT A c that are nre^e^t in enual amounts between the two samples, 
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as well as those that are differentially expressed. This produces significant 
redundancy. 

A sixth approach involves the dififerential display of mRNA. Liang ei al, 
1995, Methods Efizymol 254:304-321 . PGR primers of arbitrary sequence, or 
5 designed to optimize the desired pseudo-random amplification, are used to amplify 
sequences from two mRNA samples by reverse transcription, followed by PGR. 
The products of these amplification reactions are run side by side, /.e,, pairs of lanes 
contain the same primers but different mRNA samples, on DNA sequencing gels. 
Differences in the extent of amplification can be detected by eye. Bands that appear 
10 to be differentially amplified between the two samples can be excised from the gel 
and reamplified for characterization. If the collection of primers is suitably large, it 
is generally possible to identify at least one fi-agment that is differentially amplified 
in one sample compared with the second. The disadvantage of the method is its 
explicit reliance on random events, and the vagaries of PGR, which strongly bias the 
1 5 subset of sequences that can be detected by the method. 

Yet another approach is Representational Difference Analysis (RD A) of 
nucleic acid populations firom diflferent samples. Lisitsyn a/., 1995, Methods 
Emymol 254:291-304. RDA uses PGR to amplify firagments that are not shared 
between two samples. A hybridization step is followed by restriction digests to 
20 remove fragments that are shared from participation as templates in amplification. 
An amplification step allows retrieval of fi*agments that are present in higher 
amounts in one sample compared to the other. Again, the method is subject to the 
limitations of PGR and DNA hybridization which tend to bias the results strongly 
toward ceitaiu fiagments and away from others. Furthermore, the final products of 
25 RDA are aol ieuiesentative of the differences that exist between the two input 

samples. RDA can be used with cDNA or with genomic DNA fi-agments to identify 

f f^ys r An r f> c 
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An eighth approach for the identification of differentially expressed 
sequences involves hybridization of labeled mRN A or cDNA in solution to DNA 
fragments or oligonucleotides attached to a solid support in high density arrays. 
Schena ei aL, 1995, Science 270:467-470. Since the arrays contain known 

5 sequences placed in defined locations, the hybridization signal intensities pennit an 
assignment of the relative amount of target nucleic acid capable of hybridizing to a 
particular probe sequence. The method is parallel, rapid, and sensitive. 
Disadvantages are that the sequences in the array must be known beforehand, and 
that the hybridizing sequences cannot easily be recovered fi^om the surface of the 

10 array. 

While some of the above methods permit the determination of expression 
profiles of genes and the identification of sequences that have particular expression 
patterns, most are not sufficiently efficient and sensitive for comparative assessment 
of nucleic acids on a large scale. Thus, for example, none allows quantitative 

1 5 detection and sorting of nucleic acids at a level of efficiency and sensitivity sufficient 
to perform genetic experiments involving complex libraries, such as expression 
libraries, passaged through cells. All existing methods have defects in either 
sensitivity, speed, comprehensiveness, or the ability to recover specific sequences, 
e.g., from a genetic library. 

20 Therefore, the methods of the present invention, allowing the simultaneous 

assessment of relative amounts of a multiple mRNA species in two or more samples 
in an eflRcient manner and the recovery of sequences that have particular effects on 
cell phenotypes, provide a long desired improvement over currently available 
riieihods. The ineihods of the invention also provide other advantages, such as 

25 inc^ea-iag tlie il-.~Oughput of prober^ bccsting the generation of valuable mil 
significantly lowering the time and cost of analysis. Solid supports, specifically 

;rro<:nheres bf^ve been used to bind nucleic acid in solution, but not for 
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thc applications described for the invention herein (e.g.. Bush et aL, 1992, Anal. 
Biochem. 202 : 1 46- 1 5 1 ; Meszaros and Morton, 1 996, BioTedmiques 20 :4 13-419). 

ni. SUMMARY OF THE LNVENTION 

The invention described herein provides methods and compositions for the 

5 detection and isolation of specific target nucleic acids from a complex mixture of 
nucleic acids. The methods of this invention enable quantitative comparisons of 
numerous individual sequences and recovery of those that have specific relative 
abundance w^ith reference to other sequences in a mixture of nucleic acids, and/or to 
the same target nucleic acid in a different complex mixture. Thus, the present 

10 invention solves several problems encountered in the sorting and retrieval of nucleic 
acid sequences from complex sequence mixtures. 

The methods of the present invention allow direct assessment of the relative 
abundance of specific nucleic acids in samples derived from different sources, for 
from different tissue or cell tvoes. and disease- or developmental stages. 

1 5 The present invention fiirther permits the application of such sorting and retrieval 
techniques to genetic experiments that involve passage of libraries, such as 
expression libraries, through host cells. The passaged libraries may then be 
retrieved and the library sequence subsets compared. Using these methods, 
sequences which have specific effects on one or more cell phenotypes may be 

20 recovered. 

In addition, the methods of this invention are amenable to cycling and 
enrichment procedures. This, in turn, enables the methods to be applied to genetic 
selections that are relatively non-sUiugeiit because the sclccuon can be applied 

25 (e.g., 100 fold per cycle), can be applied repeatedly, thus producing a multiplicative 
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The invention also provides a method for selecting large numbers of 
identifier sequences that compose a set, the individual members of which do not 
cross-hybridize vs^ith other members' complementary sequences under chosen 
conditions. The method for selection and synthesis of this set of sequences is 
5 simple and rapid. The invention provides synthesis of identifier sequences in a 
combinatorial fashion for attachment to the target nucleic acids, synthesis of the 
identifier sequence complements on beads, hybridization of the two components 
(target and beads), detection of the hybridization results and the collection of 
sequences with desirable properties based on their abundance profiles. 

1 0 Using the methods and compositions of the invention, the specificity of 

hybridization is sufficient to permit distinguishing of upwards of 10,000 individual 
sequences in a single hybridization reaction; that is, under the chosen conditions, the 
signal of correctly hybridized target nucleic acid is readily distinguishable from the 
background noise caused by non-specific hybridization. In addition, the identifier 

1 5 sequences of this invention are capable of hybridizing with kinetics rapid enough to 
allow numerous experiments to be performed in relatively short periods of time. 

Accordingly, the invention vastly broadens the scope of genetic selections 
that can be employed in genetic experiments by enabling the recovery of sequences 
that affect phenotypes of cells (e.g., growth regulators); the normalization of 

20 libraries and selected library subsets such that more numerous and more diverse 
sequences can be recovered in a single experiment; the comparison between 
libraries that have been passaged through different cell types or cells in different 
physiological states; the application of negative selections in which sequences that 

Generally, the invention employs solid supports referred to as beads, that 
have stably attached to their surface oligonucleotides or nucleic acid fragments, 
collectively referred to as "capture oligonuGleotiries" The capture oligonucleotides 




• 
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are synthesized in such a way that each bead contains multiple copies of one 
oligonucleotide sequence, typically Ix 10^ to Ix 10^**, linked to the bead surface. 
Thus, the population of beads may contain several million diflferent capture 
oligonucleotides, each bead having only one type of capture oligonucleotide 
attached to its surface. The beads with the attached unique capture 
oligonucleotides are used as hybridization probes in solution. The target nucleic 
acids are labeled with a marker, preferably a visual marker, most preferably a 
fluorophore, to permit detection by instruments such as the automated fluorescence 
activated cell sorter. Typically, target nucleic acids derived from different sources 
are labeled with different fluorophores which can readily be distinguished. 

In one aspect of the invention, the target nucleic acids from the first source 
are linked to a first label, and the target nucleic acids from the second source are 
linked to a second label The labeled target nucleic acids from the different sources 
are pooled and contacted with a number of beads each having attached thereto 
capture oligonucleotides of a unique sequence, under conditions that promote the 
formation of perfectly matched duplexes between the capture oligonucleotides and 
nucleic acid molecule complements within the pool. Subsequently, the beads are 
sorted according to the relative amount of the first label and the second label, and 
beads of interest retrieved. Finally, the identity of nucleic acid molecules which 
have a defined ratio of first and second label is determined. 

In another aspect of the invention, relative amounts of transcript levels in 
cells are determined. For example, approximately equal amounts of mRNA or 
cDNA derived from two different cell or tissue types are labeled with two different 

llldlKCih, pieieiaOiy lluOlOpllOiCa, auu CoutaL/t^^u Vvilia tiiv u»^u,w iiwv**!^ wupi.t*»w 

two samples. Differences in abundance are identified, and the relevant sequences 
are recovered and rb^Tactenzed These differences may involve mKNAs/cDNAs 
that are over Tc^resented in one r»Qnij|«tiQTTi ^5 

comnareri to the other 
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In another aspect of the invention, genomic DNA derived from different 
sources is compared to identify copy numbers of specific chromosomal regions or 
loci, thereby identifying regions which are deleted or ampHfied, e.g., in samples 
derived from tumor tissue. In yet other aspects, genomic DNA fragments are linked 
5 to reporter genes to assess, for example, promoter activity of specific genomic 
DNA fragments in different cells. 

Yet another strategy involves attachment of identifier tags to cloned DNA 
fragments. The identifier tags of the invention are selected to have minimal cross- 
hybridization activity. Typically, the identifier tags have the form of tandem 
10 multiphers of simpler sequence units of about two (2) to about fifteen (15) 

nucleotides in length, preferably of about seven (7) to about twelve (12), and more 
preferably of about seven (7) to about nine (9) nucleotides in length. In one 
preferred embodiment of the invention, sequence identifier tags comprise a 
combination of between two (2) and six (6) sequence units in tandem, each unit 
1 5 consisting of from about seven (7) to about fifteen ( 1 5) nucleotides. 

In another preferred embodiment of the invention, a family of identifier tags 
consists of a 24-mer, composed of combinations of three 8-mers. This population 
of 24-mers can be synthesized in 100 automated DNA synthesis columns using two 
stages of "split and recombine" synthesis. After completion of the last round of 
20 couplings, the result is a family of identifier tags comprising a degeneracy of about 
Ix 10^ (lOOx lOOx 100). If the individual 8-mers are chosen propitiously, the 
greatest similarity among any two members of the family can be minimized. In 
cases where the target nucleic acids are linked to such identifier tags, the beads, as a 
variation, arc synthesized v.'ith the "complements" of the above identifier tags as 
25 capture oiigcnuciectides. 

An important aspect of the invention relates to methods for the 
determination of the relative abundance of individual cDNA (or genomic DNA) 
inserts m a genetic library, wherein the individiial inserts are linked to nniqiie 
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identifier tags, which have been passaged through different cell types. This 
approach, referred to as "post-passage library comparison", permits identification 
and recover>' of specific DNA sequences from the original library that are increased 
in abundance after passage through one cell type compared to the other. 1 hese 
5 sequences are candidates for genes or gene fragments that either selectively 
promote cell growth or inhibit cell growth. 

In yet another aspect, the invention relates to methods for the normalization 
of cDNA libraries, i.e., a process to convert a cDNA library that represents different 
mRNAs according to their abundance in the cell into a library that represents the 
1 0 different mRNAs in roughly equal amounts. 

Finally, the invention relates to methods for the recovery, identification and 
analysis of sequences that have a specific relative abundance in two populations of 
nucleic acid, e.g., mRNA, cDNA or genomic DNA. 

1 5 IV. BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 depicts the fluorescence activated cell sorting of beads with 

labeled nucleic acids attached thereto, as described in Example 2, infra. 

FIGURE 2 depicts the sensitivity of the oligonucleotide-conjugated beads in 

hybridization and fluorescence activated cell sorting analyses, as described in 
20 Example 3, infra. 

FIGURE 3 depicts a representation of results of a fluorescence activated cell 

sorting analysis showing sensitivity of the oligonucleotide-conjugated beads when 

1% of the beads hybridize to the target and 99% do not, as described in Example 4, 

infra. 

*^iGURE 4 depicts the i;igri^iL''noise ^^it!0 m ilit^ tjrt^ssiiCS of 10 !Ti'CrcM~!0'fi'* 
nonspecific sequences, as described in Example 5, infra. 

^IGURE 5 depicts the sorting of labeled be^ids b?,sed on flpo^f *>rp"re 
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FIGURE 6 depicts the concept of the "split and recombine" synthesis 
strategy for the generation of random N-mers wherein N is the length of the 
oligonucleotide, as described in Example 7, infra, 

FIGURE 7 depicts the concept of the "split and recombine" synthesis 
5 strategy for the generation of sequence identifier tags, as described in Example 8, 
infra. 

FIGURE 8 depicts the use of sequence identifier tags. Three strategies to 
capture specific sequences from a complex mixture of nucleic acids using sequence 
identifier tags are illustrated. The first at the top of the drawing involves use of 
10 random (or pseudorandom), e,g., 15-mers attached to beads. The second strategy 
involves the capture of oligo-dT)-primed cDNA. The third strategy, depicted at the 
bottom half of the drawing, involves priming of the mRNA with a mixture of 24- 
mers, one million-fold degenerate in total. See, Example 9, infra. 

FIGURE 9 depicts the hybridization discrimination of identifier tags, as 
1 5 described in Examole 10. infra 

FIGURE 10 depicts the generation of double stranded cDNA marked with 
identifier tags, as described in Example 11, infra. 

FIGURE 1 1 depicts the enrichment and recovery of cDNAs prepared fi-om 
two different sources, as described in Example 12, infra. 
20 FIGURE 12 depicts the concept of post-passage library comparison, as 

described in Example 13, infra. 

FIGURE 13 depicts normalization of cDNA libraries by hybridization to 
beads using, e.g., the 24-mer identifier tags, grouping of clones according to 
relative ainuuius and subsequeuL adjusanent of aniouuis by, FCR, to ruuii the 
25 fiiidl iiO:iiialized ^}ool of cDNAs, as described iii Exaii-ple 14, irifra. 

FIGURE 14 depicts the quantitative comparison of mRNA levels in a 

*:nnHiiinrH 5»cc5»\f s*c HpcrriKprl in Py^mnlA 1 S infra 
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FIGURES ISA and 15B depict kinetic genetics involving the passage of, 
e.g., a cDNA library through two different cell types, as described in Example 16, 
infra, 

FIGURE 16 depicts a C-h- source code for the selection of 8-mer sequences 
5 that comprise a set with minimal cross-hybridization of the constituent members, as 
described in Example 17, infra. 

FIGURE 1 7 depicts flow cytometric histograms of fluorescence intensities 
of individual beads from a population hybridized to target complementary identifier 
sequences, as described in Example 19, infra. 

10 (A) Auto fluorescence of 13,824 different identifier sequence-tagged beads 

(FLl - 525 +/- 20nm light; FL2 = 575 +/- 15nm light). 

(B) Specific labeling of 7.9% of the 13,824 different identifier sequence- 
tagged beads with HEX-labeled complementary identifier sequence tags (ID Tags) 
that were synthesized on an oligo synthesizer. 

1 5 FIGURE 1 8 depicts flow cytometric histograms of fluorescence intensities 

of fluorescently labeled RNA transcripts (approximately 60 bases in length) 
comprising 24 base oligonucleotide identifier tags at their 5' end (A; "5' bead"); 3' 
end (B; "3' bead"); or approximately in the middle of the transcript (C; "Mid bead"); 
hybridized to beads with attached complementary capture oligonucleotides, as 

20 described in Example 18, infra. Control beads with attached DNA capture 

oligonucleotides which were not complementary to the oligonucleotide tags (i.e., 
non-specific sequences) were used as a control (D: "NS bead"). "Bead alone": no 
target nucleic acid added to the beads during hybridization; "2 nM 5'c'"(control): a 
24 base RNA traiiscripi (2 fiM) having perfect cuurpiemeuiaiity to the capture 

DNA"(control): a single-stranded DNA construct (2 fiM) having the same 

^C^UCnce tViA f^ct RNA. tr^" script V/^S Hddf*d h^?A^ d^'n^J hyhriHiyntion; **s 

..TV/T" ^T- *M m\4 ^OrMor P V A tf^rir " /tor*- ^'>m♦^loc^• fVio t*>ct R >vT tr5inc/»rint n^ac 
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added (5 \iM or 1 \iM) to the beads during hybridization; "20 ^iM Non-specific" 
(control): 20 ^M of random DNA oligonucleotide sequences was added to the 
beads during hybridization. 

V. DEFINITIONS 

5 Terms used herein are in general as typically used in the art. The following 

terms are intended to have the following general meanings as they are used herein: 

The term "complement** refers to a nucleic acid sequence to which a 
second nucleic sequence specifically hybridizes to form a perfectly matched duplex 
or triplex. 

1 0 The term "cognate" refers to a sequence capable of forming a perfectly 

matched (see supra) duplex with its complement in the reaction mixture. "Non- 
cognate" refers to non-perfectly matched duplexes that may form—especially 
sequences that share very little in the way of complementary sequences to permit 
Watson- Crick base-pairing. 

1 5 The term "oligonucleotide" includes linear oligomers of natural or modified 

monomers or linkages, including deoxyribonucleosides, ribonucleotides, a-anomeric 
forms thereof, further peptide nucleic acids, and the like, capable of specifically 
binding to a target polynucleotide by way of a regular pattern of monomer-to- 
monomer interactions, such as Watson-Crick type of base pairing, base stacking, 

20 Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually, 
monomers are linked by phosphodiester bonds or analogs thereof to form 
oligonucleotides ranging in size from a few monomeric units, e.g., three (3) to four 
(4), to several tens of monomeric units. Whenever an oiigonucieo tide is 
represented by s sequence of letter?;, si:ch 'ATGCCTG/' it will be undeth^cod 

25 that the nucleotides are in 5'- 3' order from left to right and that "A" denotes 

\a^w/\jr uo^iio^Aiiw, vAvw/i^j *^jr ^ ij, , 
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include phosphorothioate, phosphorodithioate, phosphorandilidate, 
phosphoramidate, and the like Usually oligonucleotides of the invention comprise 
the four natural nucleotides; however, they may also comprise non-natural 
nucleotide analogs. It is clear to those skilled in the an when oligonucleotides 
5 having natural or non-natural nucleotides may be employed, e.g., where processing 
by enzymes is called for, usually oligonucleotides consisting of natural nucleotides 
will be required. 

The phrase ''perfectly matched^' in reference to a duplex means that the 
poly- or oligonucleotide strands of a duplex form a double-stranded structure with 
1 0 one other oligonucleotide strand such that every nucleotide in each strand 

undergoes Watson-Crick base pairing with a nucleotide in the other strand. The 
term also comprehends the pairing of nucleoside analogs, such as deoxyinosine, 
nucleotides with 2-aminopurine bases, and the like, that may be employed. In 
reference to a triplex, the term means that the triplex consists of a perfectly matched 
1 5 duplex and a third strand in which every nucleotide undergoes Hoogsteen or reverse 
Hoogsteen association with a base pair of the perfectly matched duplex. 

A ''mismatch'' in a duplex between a tag and an oligonucleotide means that 
a pair or triplet of nucleotides in the duplex or triplex fails to undergo Watson-Crick 
and/or Hoogsteen and/or reverse Hoogsteen bonding. A single mismatch refers to a 
20 single non-Watson-Crick basepaired position in the duplex; a double mismatch 
refers to two mispaired bases, either in tandem or separated by one or more 
correctly paired positions; etc. 

The term "nucleotide" includes the natural nucleotides, including 2'-deoxy 
and z'-hyuiuxyl foiuis, aiialugs and deiivauves tlieieof, fuitliei syntlieuc iiucleotides 
25 iiavhiu iiiodiiled ba^e riioietiei a-id^'or ■•ledified bugar riiolelles, e g , desciibed by 
Scheit: Nucleotide Analogs (John Wiley. New York, 1980); Uhlman and Peyman, 
1 QQo Ch^^irr^l Revir^v^ on- 5^^-58^, the *'^^e, wit^ t^e o^iy provi*!'^ ^^^X ^^f^y 
capable of specific hybridization. Such analogs include synthetic nucleotides 
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designed to enhance binding properties, reduce degeneracy, increase specificity, and 
the like. 

A "linker" is a moiety, molecule, or group of molecules attached to a solid 
support, referred to as bead and spacing a synthesized polymer or oligomer, e.g., a 
5 oligonucleotide or other nucleic acid fi-agment, fi-om the bead. 

A "bead" refers to solid phase supports for use with the invention. Such 
beads may have a wide variety of forms, including microparticles, beads, and 
membranes, slides, plates, micromachined chips, and the like. Likewise, solid phase 
supports of the invention may comprise a wide variety of compositions, including 
1 0 glass, plastic, silicon, alkanethiolate-derivatized gold, cellulose, low cross-linked 
and high cross-linked polystyrene, silica gel, polyamide, and the like. Other 
materials and shapes may be used, including pellets, disks, capillaries, hollow fibers, 
needles, solid fibers, cellulose beads, pore-glass beads, silica gels, polystyrene beads 
optionally crosslinked with divinylbenzene, grafted co-poly beads, poly-acrylamide 

1 < U>.%^^g l^^+o^^ Koo/^c /^iTv^o+Viylo^^r^/lormrlA Koo^Hq ^^nt^nlnal^y CrOSS-linked With N,N^-biS- 

acryloyl ethylene diamine, and glass particles coated with a hydrophobic polymer, 

etc., i.e., a material having a rigid or semirigid surface. 

An "identifier tag" refers to a nucleotide sequence that can be attached via 

ligation or primed synthesis onto individual nucleic acid molecules, thus providing 
20 unique or almost unique means for identification and retrieval. For purposes of the 

invention, the length of an identifier tag is fi-om about ten (10) to about ninety (90) 

bases and typically ranges firom about ten (10) to about forty (40) bases. 

The term "genetic library" refers to a collection of DNA fi-agments derived 

from mRNA, genomic DNA or synthetic DNA (non-natural DNA sequence) 
25 propagated in a vector that may be plasmid or vh us based, 'fhe size of a genetic 

library may vary ft-om a few individual inserts (or clones) up to many millions of 

clones. 
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The term "random sequence" refers to a set of nucleotide sequences of 
specified length such that the entire population encompasses every possible 
sequence of that length Thus, a random sequence of length N contains 4^' distinct 
individual sequences 

5 VL DETAILED DESCRIPTION OF THE INVENTION 
A. Overview 

The present invention relates to a method for the assessment of relative 
amounts of nucleic acid sequences in samples derived from a plurality of different 
sources. 

1 0 More specifically, the invention relates to a method using beads having 

attached to their surface unique oligonucleotides or nucleic acid fragments, 
collectively referred to as capture oligonucleotides or capture fragments, to select 
specific labeled nucleic acid sequences. A collection of a plurality of such beads, 
each linked to multiple copies of an oligonucleotide of unique sequence, are used to 

1 5 capture nucleic acids having a specific sequence to assess the relative abundance of 
specific nucleic acid sequences and to retrieve and analyze sequences with defined 
relative abundance. 

More specifically, the methods of the invention may be used to compare 
quantitatively the amount of specific nucleic acid sequences in at least two samples 

20 derived from different sources, e.g., different cell or tissue types, different disease 
or developmental stages, and the like. Nucleic acids from the two samples are 
labeled in such a fashion that the signals can be distinguished and compared 

25 analysis in cases where a fluorescent label is linked to the target nucleic acids, 
;*Grordin2 ^^^^^ ^^e first label and the second label, which is indicative of 
the re^at've aTnonntq transcript contained in the two sources l he beads, along 
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with the bound nucleic acid having a particular expression profile, are retrieved, and 
the nucleic acid is eluted and analyzed, for example by DNA sequence analysis 

B. Generation Of Beads Comprising Capture Oiigoniideotides Or 
Nucleic Acids 

5 Solid Supports/Beads. The solid support materials to which the 

capture oligonucleotides or nucleic acids are attached are referred to herein as 
beads. Such beads may have a wide variety of shapes and may be composed of 
numerous materials, as defined, supra. Briefly, solid supports^eads used with the 
invention typically have a homogenous size between 1 and 100 microns, and include 
1 0 microparticles made of controlled pore glass (CPG), highly cross-linked 

polystyrene, acrylic copolymers, cellulose, nylon, dextran, latex, polyacrolein, and 
the like. See, among other references. Me//?. EnzymoL, Section A, pages 1 1-147, 
vol. 44 (Academic Press. New York, 1976); U.S. Patent No. 4,678,814; U.S. 
Patent No. 4,413,070. Beads also include commercially available nucleoside- 
1 5 derivatized CPG and polystyrene beads, e.g., available from Applied Biosystems, 
Foster City, CA; derivatized magnetic beads; polystyrene grafted with polyethylene 
glycol, e.g., TentaGel™, Rapp Polymerc, Tubingen Germany, and the like. 

Selection of the bead characteristics, such as material, porosity, size, shape, 
and the like, and the type of linking moiety employed depends on the conditions 
20 under which the capture oligonucleotides are used. For example, in applications 
involving successive processing with enzymes, supports and linkers that minimize 
steric hindrance of the enzymes and that facilitate access to substrate, are preferred. 
Other important factors to be considered in selecting the most appropriate 
micropavticie suppori include si/e, umfouniiy, eiiiciency as a synthesis support, 
25 degree to which the surface area is known, and optical properties, e.g., 

autofluoresence. Typically, a population of discrete particles is employed such that 
each has a uniform population of the same cligcnuclectide cr nucleic acid fragment 
(and no other) However, beads with spatially discrete regions each containing a 
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uniform population of the same oligonucleotide or nucleic acid fragment (and no 
other), may be employed. In the latter embodiment, the area of the regions may 
vary according to particular applications. Preferably, such regions are spatially 
discrete so that signals generated by events, e,g,, fluorescent emissions, at adjacent 

5 regions can be resolved by the detection system being employed. 

In the preferred embodiments of the invention, beads are typically composed 
of glass, plastic, or carbohydrate, and have chemical and spectral properties 
appropriate for their use in nucleic acid attachment and fluorescent activated cell 
sorter analysis. For example, if they are used with chemical synthesis of 

10 oligonucleotides, they must withstand prolonged exposure to organic solvents such 
as acetonitrile. They can be chemically derivatized so that they support the initial 
attachment and extension of nucleotides on their surface. The beads also will 
possess autofluorescence profiles and mass densities that permit their use on a 
fluorescence activated cell sorting machine. In general, the solid support may be 

1 5 composed of some form of glass (silica), plastic (synthetic organic polymer), or 
carbohydrate (sugar polymer). A variety of materials and shapes may be used, 
including beads, pellets, disks, capillaries, hollow fibers, needles, solid fibers, 
cellulose beads, pore-glass beads, silica gels, polystyrene beads optionally cross- 
linked with divinylbenzene, grafted co-poly beads, poly-acrylamide beads, latex 

20 beads, dimethylacrylamide beads optionally cross-linked with N,N*-bis-acryloyl 
ethylene diamine, glass particles coated with a hydrophobic polymer, etc., /.e.. a 
material having a rigid or semirigid surface. 

Attachment Of Capture Oligonucleotides To Beads: Linker Chemistry, 
Capture oligonucleotides may be synthesized directly on the bead upon v;hich they 

25 will be used, or thev mav be secarateiy s\T.thesi2cd and attached to a bead fcr use. 
e.g. as set forth in Lund e/ a/., mi. Nucleic Acids Research }6:\Oi6l'\0S%0\ 
Albretsen et ai. 1990. AnaL Biochem. 189:40-50: Wolf ei ai, 1987. Nucleic Acids 
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Research 15:291 1-2926; and Ghosh ei al, 1987, Nucleic Acids Research 15:5353- 
5372. 

The oligonucleotides may be attached to the beads using a variety of 
standard methods. Conveniently, the bond to the bead may be permanent, but a 
5 linker between the bead and the product may also be provided which is cleavable 
such as exemplified in Example 1. Exemplary linking moieties for attaching and/or 
synthesizing tags on microparticle surfaces are disclosed in, e.g., Pon ei aL, 1988, 
Bioiechniques &J6%'n5; Webb, U.S. No. Patent 4,569,774; Barany et aL PCT 
Patent AppUcationPCT/US9 1/06 103; Browne/ a/., 1989,7. Chem. Soc, Commun. _ 
10 :891-893;Dambae/a/., \990, Nucleic Acids Research Beattiee/ 
a/., 1993, Clinical Chemistry 211^9-112', Maskos and Southern, 1992, Nucleic 
Acids Research 20:1679-1684. 

Desirably, when the product is permanently attached, the link to the bead 
will be extended, so that the bead will not sterically interfere with the binding of the 
1 5 product during screening. Various links may be employed: including hydrophilic 
links, such as polyethyleneoxy, saccharide, polyol, esters, amides, saturated or 
unsaturated alkyl, aryl, combinations thereof, and the like. 

Functionalities present on the bead may include hydroxy, carboxy, 
iminohalide, amino, thio, active halogen (CI or Br) or pseudohalogen {e.g., -CF-,, - 
20 CN, etc.), carbonyl, silyl, tosyl, mesylates, brosylates, triflates or the like. In some 
instances the bead may have protected functionalities which may be partially or 
wholly deprotected prior to each stage, and in the latter case, reprotected. For 
example, amino acids may be protected with a carbobenzoxy group as in 
polypeptide synthe'?is, hydroxj/ with a benzj'l ether, and the like. 
25 h\ mxm cases, detachment of the capture Qiigonucleotide may be desired 

and there are numerous functionalities and reactants which may be used for 
detaching. Conveniently, ethers may be used, where substituted benzyl ether or 
derivatives thereof, e.g, benzhydryi ether, indanvi ether, and the like may be 
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cleaved by acidic or mild reductive conditions. Alternatively, one may employ B- 
elimination, v^here a mild base may serve to release the product. Acetals, including 
the thio analogs thereof, may be employed, using mild acid, particularly in the 
presence of a capturing carbonyl compound. By combining formaldehyde, HCl and 

5 an alcohol moiety, an a-chloroether is formed. This is then coupled with an 

hydroxy functionality on the bead to form the acetal. Various photolabile linkages 
may be employed, such as o-nitrobenzyl, 7-nitroindanyl, 2-nitrobenzhydryl ethers or 
esters, and the like. Esters and amides may serve as linkers, where half-acid esters 
or amides are formed, particularly with cyclic anhydrides, followed by reaction with 

1 0 hydroxyl or amino functionalities on the bead, using a coupling agent such as a 
carbodiimide. Peptides may be used as linkers, where the sequence is subject to 
enzymatic hydrolysis, particularly where the enzyme recognizes a specific sequence. 
Carbonates and carbamates may be prepared using carbonic acid derivatives, e.g., 
phosgene, carbonyl diimidazole, etc. and a mild base. The link may be cleaved 

1 5 using acid, base or a strong reductant. e.g. , Li AIH4, particularly for the carbonate 
esters. 

If the capture oligoncucleotides are chemically synthesized on the bead, see, 
infra, the bead-oligo linkage must be stable during the deprotection step. During 
standard phosphoramidite chemical synthesis of oligonucleotides, a succinyl ester 

20 linkage is used to bridge the 3' nucleotide to the resin. This linkage is readily 

hydrolyzed by NH3 prior to and during deprotection of the bases. Thus, the finished 
oligonucleotides are released from the resin in the process of deprotection. 

In specific embodiments of the invention, the capture oligonucleotides are 
imiro/^ +r> KonHc n\ ^na o c^ioxane linkage to Si atoms on the surface of glass 

2"^ beads" (2) a ohoschodiester iinkaae to the choschate of the S'-terminai nucleotide 
via nucleophilic attack by a hydroxyl (typically an alcohol) on the bead surface; or 
(3) a phosphoramidate linkage between the 3'- terminal nucleotide and a primarv 
amine conjugated to the bead surface. 
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In a first embodiment, glass beads are treated with 3- 
glycidoxypropyltrimethoxysilane to generate a terminal epoxide conjugated via a 
linker to Si atoms on the glass. In a second step, the expoxide is opened with either 
water or a diol to generate alcohols. Maskos and Southern, 1992, Nucleic Acids 
5 Research 20: 1 679- 1 684. The resulting siloxane linkage is relatively stable to base 
hydrolysis. Glass beads are a necessary starting material to produce hydroxyl 
groups suitable to begin cycles of phosphoramidite chemistry in a conventional 
automated DNA synthesizer. In some preferred applications, commercially 
available controlled-pore glass (CPG) or polystyrene supports are employed as 
10 beads. Such supports are available with base labile linkers and initial nucleosides 
attached, by, e.g.. Applied Biosystems (Foster City, CA). Alternatively, non-porous 
glass beads, e.g., Ballotini spheres are employed (Maskos and Southern, 1992, 
Nucleic Acids Research 20: 1679-1684). 

In a second embodiment, the linkage is created by the reaction of primary 
1 5 amines with phosphoramidite nucleotides to produce a base-stable linkage. Pon ei 
ai , 1 988, Biotechniqms 6:768-775. In the first step of the reaction an N-P linkage 
is formed due to nucleophilic attack by nitrogen on phosphorus. This linkage is 
oxidized in a subsequent step to the phosphoramidate, a stable chemical linkage. 
Beads that are functionalized with surface primary amines can be obtained from 
20 commercial sources. 

In a third embodiment, the capture oligonucleotides are attached to the bead 
via a phosphodiester bond generated by standard phosphoramidite synthesis 
utilizing the attack of bead-linked hydroxyl oxygens on the nucleotide phosphorus 
to produce a phosphodiester bond; following oxidation with molecular iodine. 
25 Others have utilized this reaction to generate stable linknges , Needels ei ai , 
] 993, Froc. Natl Acad. Sci. a^,A. 90:10700-10704). The key step is the 
dcrivatization of appiopuate beads such that they contain significant numbers of 
hydroxyl fanctioiial giOupa uu lueii suifaue. Ii is possible to purchase such 
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functionalized beads from a variety of commercial sources; the capture 
oligonucleotides may be synthesized chemically on the surface of these 
functionalized beads. 

Generally, standard synthesis chemistries are used, such as phosphoram.idite 
5 chemistry, as disclosed in Beaucage and Iyer, 1992, Tetrahedron 48:2223-23 11, 
Molko et al, U.S. Patent No. 4,980,460; Koster et ai, U.S. Patent No.4,725,677; 
Caruthers eial, U.S. Patent Nos. 4,415,732; 4,458,066; and 4,973,679. 
Alternative chemistries, e.g., resulting in non-natural backbone groups, such as 
phosphorothionate, phosphoroamidate, and the like, may also be employed, 
1 0 provided that the resulting capture oligonucleotides are capable of specific 
hybridization. 

As described in Shortle et ai, PCT Application PCT/US93/03418, 
phosphoramidite chemistry may be used. 3' phosphoramidite oligonucleotides are 
prepared according to standard procedures described. Synthesis proceeds as 

1 5 disclosed by Shortle et ai, or in direct analogy with the techniques employed to 
generate diverse oligonucleotide libraries using nucleosidic monomers, e.g., as 
disclosed in Telenius et ai, 1992, Genomics 11:718-725; Welash et a/., 1991, 
Nucleic Acids Research 19:5275-5279; Grothues et al, 1993, Nucleic Acids 
Research 21:1321-1322; Hartley. European Patent Application No. 90304496.4; 

20 Lam et aL, 1991, Nature 354:82-84; Zuckerman et al, 1992, Int. J. Pepl Protein 
Res. 40:498-507, Generally, these techniques call for the appUcation of mixtures of 
the activated monomers to the growing oligonucleotides during the coupling 
process. 

01igo:iv.clcGtidc ExtcnsiGrJA^^pliJlcaiiGii Strategy. A prerequisite of the 
25 ir.vcnticn disclcsed herein is that each individual bead have many copies of onC; and 
preferably only one, and no more than a few, unique capture oligonucleotide or 
nucleic acid sequences displayed on its surface This can be achieved in a vRriety of 
way<; 
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In one embodinfient of the invention, the capture oligonucleotides are 
synthesized by constraining the PCR to the surface of the beads. For example, the 
beads may be coated with two amplification primers, one "forward" primer and a 
"reverse" primer, which are complementary to a target nucleic acid sequence. In 

5 solution, these two primers are capable of amplifying the target nucleic acid. When 
these primers are on a bead coupled via their 5' ends they are not freely diffusible in 
solution. These primers will prime synthesis of new molecules while attached to the 
bead. Thus, potential template molecules must difiiise to the bead and anneal to the 
attached primer(s). When this happens, a complementary strand can be synthesized 

10 on the template using a DNA polymerase exactly as the reaction occurs during 
normal solution phase PCR. Following extension of the new strand, denaturation 
releases the original template molecule, but leaves the newly synthesized strand 
attached to the bead via its priming oligonucleotide. In a second round of annealing 
and extension, the new strand can fold back onto the bead surface to hybridize with 

1 5 the reverse primer forming a bridge. This bridge can be converted into double- 
stranded DNA by a further round of extension with a polymerase. The denaturation 
step results in two complementary single strands attached to the bead, one derived 
from the forward primer, the other one from the reverse. In subsequent rounds of 
amplification, the two strands reanneal with other primers on the bead's surface. If 

20 a single template molecule begins the amplification on a given bead, and if the 

Watson strands are released by selective hydrolysis of the Watson primer linker , for 
example, the bead ends up covered by many copies of a single sequence (within the 
limits of PCR). This method could be used to generate a family of beads, each 
having a unique sequence representing^ for instance, a done from a cDNA library. 

25 In this embodiment, unique mi^mc 'dikl ^r?1smen^s attached tc a solid support, such 
as a bead, may have a length of from about 50 to about 5,000 nucleotides. 

In preferred embodiments, the family of beads each with a single type of 
capture oligoiiuoieoLlde sequence attached to its surtace is created by chemical 
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synthesis in a "split synthesis" mode. More specifically, a population of beads with 
capture oligonucleotides of arbitrary length and random sequence is generated as 
follows: A collection of beads numbering in the millions is spUt into four groups 
designated (a), (c), (g), and (t). Each group serves as the basis for deposition of the 

5 first nucleotide, which is different for all groups. Thus, group (a) receives an 
adenosine moiety, group (c) receives a cytosine, group (g) receives a guanosine, 
and group (t) receives a thymidine. Following completion of the first synthesis step 
the four groups of beads are pooled into a common pot, mixed and redistributed 
(split) into each of the four initial groups. Thus, one quarter of group (a) is left in 

10 the original group's location, one quarter is mixed with the remaining quarter of 
group (c), one quarter with group (g), etc. A second round of synthesis is then 
completed placing an adenosine on the beads in the group (a) location, a cytosine 
on the beads in the group (c) location, etc. This process can be repeated several 
times to generate a population of beads that, overall, has random sequence (equal 

1 5 amounts of A, C, G and T at each base position), but with each bead having a 
homogenous population of capture oligonucleotides on its surface. See, FIGURE 
6. The subdivision and reassortment of beads during synthesis can be varied to 
skew the population of beads away from a random sequence distribution. The 
number of bases per oligonucleotide (a constant for each synthesis) can be varied 

20 from synthesis to synthesis. Using this approach, oligonucleotides of a determined 
length, typically between approximately ten (10) and fifty (50) nucleotides long, 
preferably between approximately ten (10) and forty (40) nucleotides long, may be 
produced. In one preferred embodiment of the invention, oligonucleotides between 
approximately ten (1 0) and twenty (20) nucleotides long aie pi oduced. lu auotlier 

25 preferred OuibodiuiCut of the inveMtio-i, capture oligoiiudeotides -avi-jg a length of 
from about twelve (12) to about thirty (30) nucleotides and which comprise a 
stretch of from about 10 to ?hoi^t 70 mirieotide^ o^ rp^ndon^ seo.'ip^re nroH^T^d 
In yet another preferred embodiment of the invention, 24 mers composed cf three 
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8-mer units are produced. As an alternative, a defined sequence of a desired 
number of bases may be added to the growing capture oligonucleotide attached to 
the surface of the beads at any stage in the synthesis. Thus, the capture 
oligonucleotides may contain certain regions of identity and certain regions of 

5 known distinguishable sequence. 

In some cases it is desirable to generate beads with capture oligonucleotides 
that are not random in sequence, yet nonetheless contain among them a 
considerable degree of diversity. This is accomplished by parallel chemical 
syntheses. However, when a high diversity of capture oligonucleotides is desired, 

10 this becomes extremely expensive and labor-intensive with current technology. 
However, as provided by the present invention, a combinatorial diversity may be 
generated by a modified "pool and split" synthesis approach. See, FIGURE 7. For 
example, with this approach two split and recombine steps on one hundred (100) 
synthesis columns would produce one million different 24-mers. Specifically, in a 

1 5 first series of couplings, one hundred (100) columns are used to synthesize one 
hundred (100) different 8-mers that remain attached to the beads in each column. 
After the eighth coupling round, the contents of each column are pooled and 
redistributed (split) into one hundred (100) new columns. Thus, all combinations of 
the contents of the one hundred (100) columns are generated, with a final number 

20 of columns again equal to one hundred (100). Eight further couplings are 
completed in these new columns, each column receiving a unique series of 
couplings. This second set of couplings generates 16-mers (eight plus eight) in one 
hundred (100) columns, vsdth a population diversity of ten thousand (10,000). After 
an additional "pool and split" operation on the column contents into the finaJ set of 

25 one hundied ( > 00) columns, eight further couplings arc completed. This iesults n\ a 
final product of orxc million different bead types, each with many copies of a unique 
?4-mer. Note that no bead type contains a sequence that is any more similar than 
the sirpilarit^^ between one cf the 8 mors. Thus, each scqutiice caii ue Ciiosen io 
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differ from any other sequence in principle, by several mismatches. This drastically 
improves the specificity of the capture oligonucleotides. 

Identifier Tags 

Some of the specific applications disclosed herein rely on "tracking" 
5 of specific individual nucleic acid molecules. This can be accomplished by attaching 
sequence identifier tags to each individual nucleic acid sequence comprising a 
mixture. 

Sequence identifier tags are unique oligonucleotide sequences that allow 
identification and recovery of specific sequences in a complex population of target 

10 nucleic acids. For example, in the case of a cDNA library that contains one million 
individual clones, it is optimal to construct the library such that each clone 
possesses its ovm unique identifier tag. 

In order to minimize the background signal, it may be necessary for the 
identifier sequences to be designed in such a way that cross hybridization is 

15 minimized. This can be accomplished by synthesis of oligonucleotides which are 
composed of pluralities of "units". Generally, such "units" range in size from about 
(2) to about thirty (30) nucleotides, preferably from about two (2) to about twelve 
(12) nucleotides, and may be synthesized using the above described 
"split/recombine" synthesis method. In one preferred embodiment of the invention, 

20 sequence identifier tags comprise a combination of between two (2) and six (6) 
sequence units in tandem, each unit consisting of from about seven (7) to about 
fifteen (15) nucleotides. The total length of the oligonucleotide may thus vary from 
about fourteen (14) to about mneuj (90) nucleotides. 

Units in the lange of from about seven (7) to about nine (9) nucleotides arc 

25 preferred, as they provide a perfect compromise between the complexity which can 
be achieved and inherent specificity. For example, using one hundred (100) 
synihesis coiumns in a spiit/recombme synthesis approach, a mixture of 74-mers 
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composed of three 8-mer units will have a complexity of 1x10^ see, supra. Thus, 
while high complexity can readily be achieved, the final 24-mer oligonucleotides can 
be hybridized with reasonably high specificity, as each individual oligonucleotide 
should differ fi-om the other 24-mers in the population by several mismatches, 
5 preferably in at least eight (8) positions. Thus, there should be minimal cross- 
hybridization. The length of the perfectly matched hybrids, 24 basepairs, also 
permits relatively high temperatures to be used for hybridization and washing. This 
characteristic is valuable in promoting more rapid hybridization reaction and 
increased specificity. A related concept for the generation of oligonucleotide 

1 0 identifier tags which exhibit minimal cross hybridization is disclosed in Brenner, 
PCT Patent Application Nos. PCT/US95/12791, PCT/95/03678, and 
PCT/95/12678, hereby incorporated by reference in their entirety. Specifically, 
Brenner discloses oligonucleotide tags consisting of a plurality of subunits three to 
six nucleotides in length selected from a minimally cross-hybridizing set. Although 

1 5 the identifier tags provided by Brenner may be used for the methods of the present 
invention, slightly longer units, as discussed above, ranging fi-om seven (7) to nine 
(9) base pairs are preferred for applications specifically disclosed herein. 

Generally, oligonucleotides are synthesized using standard techniques, see, 
supra. Section VI,B. In many instances, the oligonucleotide tags of the invention 

20 may be conveniently synthesized on an automated DNA synthesizer, e.g., an 
Applied Biosystems, Inc. (Foster City, California) model 392 or 394 DNA/RNA 
synthesizer, using the above described and referenced standard chemistries. See, 
Section VLB, 

Attachment GfTags To DNA Gt cDNA. Many approaches known to the 
25 siciiisd 2.rti*i2.n may be used to Httach the identifier oiito ^^eiiuiiiic O- cOrvA. .1*1 
the follov^ng, preferred methods are described. 

Onr* rtt-^fiarrAH mAtlinH fMnnlo-^rc 5» ftrct ctronH pT)TsJA nrimAr whirh 

5_ ^ V ... 
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10 



15 



a random sequence, e.g., a hexamer, followed by a segment comprised of a defined 
number of "units" of defined length (e.g. three 8-mer units, corresponding to the 24- 
mers described above), and, optionally, a constant sequence segment containing a 
restriction endonuclease recognition sequence. The resulting first strand primer 
thus has a length of about thirty (30) to fifty (50) base pairs, with a random 3' 
segment as a means for randomly primed cDNA synthesis, followed by, e.g., a one 
million fold degenerate 24-mer as identifier tag, and an optional 5' sequence shared 
among all primers containing a restriction endonuclease recognition sequence useful 
in cloning. Alternatively, if oligo(dT)-primed synthesis is desired, the primer 
contains 8-16 T's at its 3' end instead of the random hexamer. 

Such a first strand primer is used to reverse transcribe the first stand of 
cDNA from mRNA (or polymerize on genomic DNA) prepared fi-om a source of 
interest under conditions suited for randomly primed synthesis. The first cDNA 
strand is then converted into second strand cDNA in such a fashion that it can be 
directionally cloned in a plasmid or phage vector. Cloning techniques generally 
known in the art are employed. See, e.g, Sambrook et al, supra. Briefly, the 
cDNA is ligated to the vector, either using specific sticky end restriction 
endonuclease sites (in cases where such restriction enzyme recognition sequences 
are included at the 5' end of the first strand synthesis primer), or by blunt end 
subcloning. Typically, the phage or plasmid vector contains a selectable marker. 
The plasmids are transformed into suitable bacterial cells, e.g., E.coli and clones are 
selected. The library of clones, typically numbering at least one million independent 
colonies or plaques, are expanded and DNA is isolated. The obtained DNA then 



of the random primers), or from flanking vector sequences. The amplified cDNA 
now contains representatives from roughly one million clones, each labeled with a 




nniqne (or nearly nnique) tag, e g,, the attached ?4-rner 



wo 98/26098 



PCT/US97^3173 



.33- 

In an alternative embodiment, sequence identifier tags are attached by 
ligation of linker DNA molecules onto the ends of genomic DNA firagments or 
cDNAs. Several possible methods could be employed. One specific example 
involves ligation of a vector (e.g., a plasmid) that contains the identifier sequence 

5 tags flanking the cloning site. The population of cloning vector molecules is itself 
degenerate, since there are, e,g., one million different sequences (corresponding to 
the one million identifier tags) represented among them. After ligation, e.g., of 
genomic DNA inserts, prepared, e.g., by random shearing, into the vector 
population and transformation into E.coli host cells, a set of library clones can be 

10 isolated, each of which contains a unique or nearly unique identifier sequence 
attached to it. 

D. Labeling The Target Nucleic Acid 

In accordance with the invention, the target nucleic acids are labeled 
with a rr^arker, preferably a visual marker, including chrom^ophores- fluorophores 

1 5 and the Hke. 

In preferred embodiments, the target nucleic acid is labeled v^th 
fluorophores to permit detection by instruments like the automated fluorescence 
activated cell sorter or cell scanner. Such machines allow quantitative measurement 
of fluorescence signals in multiple channels (i.e., at multiple wavelengths) and can 

20 compute fluorescence intensity ratios at different wavelengths; typically the range 
runs between 400-600 nm. Designed to measure fluorescence in cells or on cell 
surfaces, the machines can be readily adapted to monitor fluorescence on beads of 
various types 

25 example, PGR primers, labeled at their 5'ends with, e.g, a fluorophore such as HEX 

Xjg X itu.'tX., l-fW vth/ww ^. wQ*««.r..t.» . V, 
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can be rendered single stranded such that the remaining single strands contain the 
fluorophore, and can be used for hybridization to probe sequences on beads. 

Alternatively, the fluorophores may be coupled to nucleic acid molecules by 
ligation of labeled linkers, by incorporation of labeled nucleotides via polymerases, 
5 or possibly by more nonspecific chemical reactions. A fiirther alternative involves 
incorporation of modified bases that can be bound by a fluorophore-containing 
ligand, e.g., biotinylated bases that can be bound with fluorophore-conjugated 
avidin. 

E. Hybridization Of Probes And Target Nucleic Acids 

1 0 Hybridization and washing conditions for the experiments described 

below are critical. The conditions have to be such that they promote the formation 
of perfectly matched duplexes between the probes, le. , the capture oligonucleotides 
attached to the beads, and the target, /.e., the nucleic acid molecule complements in 

'"^iiidHUce in the literature for creatine these 

1 5 conditions. Exemplary references providing such guidance include Wetmur, 1 991 , 
Critical Reviews in Biochemistry and Molecular Biology 26:227-259; Sambrook e( 
al. Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor 
Laboratory, New York, 1989): and the like. Preferably, the hybridization 
conditions are sufficiently stringent so that only perfectly matched sequences form 

20 stable duplexes. 

Relevant issues for choosing the hybridization conditions include the 
specificity or selectivity of the hybridization and the sensitivity of the method. The 
issue of specific hybridization and its optimization has been described and analyzed 

25 physical measurement processes, a key concept is the signal to noise ratio of the 

........ .J T^l . -l-*^~t *^ C^-^ « \^,,\^^XA\^^^*^^ r»v*^rtfirr»<:snf CTl/*K ic t\ye^ r\r»oc 

^1 \JC/V/UU1 1. il^ Ol^llU.1 to llUi^W 1 tlilU AV/1 *X ilj w^k.|-fw* fc*,-*. — 

described hei eiu can be esihiiated by lUeory, incorporating base composition of the 
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hybridizing sequences, length of sequences, salt concentration of the hybridization 
buffer, temperature, and the like. Generally, such calculations permit a rough 
estimate to be obtained which must be refined for practical reasons by a series of 
empirical measurements. For example, a specific sequence can be doped into the 

5 mixture of nucleic acids, along with appropriate cognate beads. A variety of 
hybridization and washing conditions can be examined, where the readout is the 
specific fluorescence signal on the cognate beads, compared with the signal on 
noncognate beads. The goal of this procedure is to arrive at conditions where the 
ratio of the cognate signal to the noncognate signal is maximal. The parameters 

1 0 that are most easily manipulated are temperature and salt concentration. Low 
stringency of hybridization involves high salt and/or low temperatures. High 
stringency, conversely, involves low salt and/or high temperatures. It is also 
possible to carry out a first wash at a relatively nonstringent condition, followed by 
a fluorescence activated cell sorting analysis. The flow through beads can then be 

cell sorting experiment. In this way, the fluorescence intensity ratios of the beads 
can be examined under two or more conditions and individual beads can be culled 
from the population according to desired ratios under these different conditions. 

Sensitivity is understood to be the minimum amount of real target nucleic 
20 acid that can be detected reliably on the bead surface. For example, a bead that 
should selectively bind sequence X, will reveal progressively lower signals for X as 
the concentration of X is reduced. In the case of fluorescence activated cell sorting 
analysis, the amount of X on the bead is measured by X-specific fluorescence. 

Selectivity is understood to be the ability of the X-specitic bead to bind X 

presented during hybridization. For example, if X is mixed with sequence Y in 
uiiFeicut piopui tiuiib, auu each is labtlticl Willi tiiC saiViC chromcphorc, the degree cf 
-seiectivity determines the ratio of X-signai to Y-signal oii m X-specific bead 
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following hybridization and washing. The limit of the sensitivity is the point at 
which the X-signal is no longer detectable above the background noise caused by 
hybridization of Y. The limit of sensitivity depends on both the amount of 
hybridized X on the bead, and the amount of non-specific binding of Y on the bead. 
5 The signal to noise issue in the context of hybridization experiments is best 
formulated in terms of chemical equilibrium, as it is defined by the difference in 
binding energies under certain conditions between X and Y to the X-beads. If the 
difference is, e.g,, 4.2 kcal/mole, at equilibrium 1000 fold more X should be bound 
than Y. 

1 0 Another key issue in the hybridization process relates to the rate at which X 

hybridizes to the X-bead. This rate depends on numerous factors, two of the most 
important being the concentration of X in solution, and the number of X-specific 
capture oligonucleotides attached to the bead surface. In reactions where X is 
present in vast excess, the reaction can be thought to proceed in a pseudo-first 

"1 ^ order r^a^^er, th^t is, the co!!cer,tr-ticr- cf X chzr.gss little as the capture 

oligonucleotides on the bead anneal to the X molecules. Under the conditions of 
the methods of the invention, the reaction proceeds according to second order 
kinetics because X is present at low concentration, at a fraction of the total 
target nucleic acid that is presented in the hybridization reaction. 

20 Hybridization reactions that involve one hybridizing species immobilized to 

a surface behave slightly differently from the ideal chemical reaction involving 
complex formation between two freely diflusible reactants. Nevertheless, it is 
usefijl to consider the concentrations of the hybridizing species, the capture 
Oligonucleotides on the bead surtace and the target nucleic acid in solution, to help 

25 ;:ndcrsirnd the \\t\v^ty ot the present liiver-tiori. 

To maximize the signal to noise ratio, it is preferred to choose hybridization 
ccnditioriS that permit maximurri biriding of tlic hycuii^ h^ui idiz^uon iaigct 
sequfciite, diid iiiiiiiiiiize the biuuiug of the nonspecific target sequences. Nucleic 
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acid hybridization is a complex process that depends on a variety of factors, 
including sequence composition and length, ionic strength, pH, and temperature. 
Propitious choice of the identifier tags is a first step in achieving a good signal to 
noise ratio. The tag sequences should be chosen such that each one has roughly the 

5 same G/C content as every other. In addition, secondary structure in the tags 
should be minimized by design. Once the sequences are selected, other variables 
such as salt concentration and temperature can be tested for hybridization and 
washing so that the signal to noise ratio is maximized. 

The kinetics of the process is critical In order to detect rare molecular 

10 species in the target nucleic acid mbrture, it is necessary to include high 
concentrations of target and/or probe in the reaction, and/or let the reaction 
proceed for a long time. Indeed the product of initial concentrations of the reaction 
species and the time of reaction (the "Cot") is a key parameter that must be 
considered, A reasonable limit for hybridization time is 24 hours. It is often not 

1 5 practical to wait longer than one day for the hybridization reaction tc proceed. In 
addition, there is a limit as to the concentration of DNA that can be manipulated in 
solution, typically not more than 10 mg/ml. 

In the case where the two hybridizing species are diffusible, a rough formula 
for predicting the rate of the reaction is given by: 

20 (1/X)(Y/5)(Z/10) X 2 = number of hours to achieve Coiia (50% formation 

of duplex), 

where X = mass of nucleic acid sequence in micrograms, 

where Y = complexity of nucleic acid sequence in kilobases (complexity 

usually is the length of tlie sequence), 
25 and v/hcrc 7 = vol-iiiie of the reaction in nriiiiiiitcr- 

Thus for a reaction that involves 10" Watson molecules and 10" Crick 
molecules of 500 basen?^irs length in a reaction volume cf IC microliters, Colj/^ x& 
exnected to be reached in about 4 hours. If, however, one of the coiuplenientafy 
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molecules, e.g., the Crick species, is attached to a solid support, this calculation is 
not necessarily valid. To compensate for the lack of diffiisibility of the bead- 
conjugated species, the sample must be continuously mixed. If the mean mixing 
velocity is comparable to the mean diffusion velocity of Crick molecules in the 

5 reaction, the reaction rate can be approximated by the same equation given above. 
A more rigorous treatment must include other aspects of the reaction, e.g., the fact 
that the bound nucleic acid molecules have fewer degrees of freedom than 
molecules in solution. Longer linker sequences can be added to separate the 
hybridizing oligonucleotide sequences from the bead surface to improve reaction 

10 rates if necessary (Lund ei aL, 1988, Nucleic Acids Res 16:10861-10880; Day et 
ai, 1991, .7228:735-740). 

1. The Capture Oligonucleotide Attached To The Bead As 
Probe 

The probe consists of immobilized DNA, referred to as 
i 5 capture oligonucleotide or nucleic acid fragment, on the surface of a bead. The 
absolute number of DNA molecules that can be attached to the bead depends on 
many factors. However, it is unlikely to exceed a density determined by the 
available surface area on a microsphere of radius. If the beads have a 10 micron 
radius, their surface area is roughly 1200 square microns (=L2x 10'^ A^). The 
20 approximate width of an aromatic ring is 6 A. Thus, typically, the capture 

oligonucleotides onto the surface are spaced not closer than 6 A, even if an alkyl 
linker is used. At an intermolecular spacing of 6 A, the number of capture 
oligonucleotides that can be attached onto the surface of a 10 micron radius bead is 
about 3x Vf. in the extreme case, d hybiiciization reaction may involve a single 
25 bead with approximately one billion capture oligonucleotides attached to its surface. 
For example, if the reaction takes place in about 1 ml hybridizatioii solution, the 
molarity of the specific oligonucleotide in ccluticn is only cn the oiclei of Ix 10 
M Tliii> can be increased either by using a smaller hybridization volume, or by 
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using a larger bead. For example, a bead that is twice the size of the 10 micron 
bead, could accommodate four times as many capture oligonucleotides on its 
surface. 

2. The Target 

5 The target nucleic acid is free in solution. We assume that 

the uppermost level of permissible nucleic acid concentration is about 10 mg/ml, 
which corresponds to a molarity of 32 fiM for fragments of an average size of 500 
bp (duplex). Accordingly, in nonrepetitive mammalian DNA, at a DNA 
concentration of 10 mg/ml an individual 500 bp fragment is present on the order of 

10 about Ix 10*^^ M. In a population of one million cDNA clones, each about 500 
nucleotides long, the concentration of each individual clone is essentially the same, 
Le., about Ix 10'^^ M, 

The nonrepetitive fraction of denatured mammalian DNA at a concentration 
of 10 mg/ml will largely reassociate within a period of one day (or thereabouts). In 

1 5 this case, each hybridizing species (Watson and Crick) is present at about 1x10'** 
M. Therefore, it is reasonable to expect that the capture oligonucleotides attached 
to the bead and a target population of cDNA with complexity of about one million 
500 bp fragments will also reassociate in the same time period. By reassociation is 
meant the formation of duplex in about half of the initial single-stranded species, not 

20 complete elimination of all single-stranded reactants. 

3. Detection Limits 

It vvould be ideal to detect signals from target nucleic acid 
hybridized to beads at a level of one 'iH a *Tiiliicn, which wouid correspor-d to 
detection of one specific cDNA fragment among one million others. The sensitivity 
25 of the method depends, as discussed ?.hove, on numercuc factors. A fiUcrcscdiL-^. 
activated cell sorting mar-liine cannot detect the signal from fewer tlictii 1 ,000- 
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] 0,000 fluorophores. Thus, the reaction must proceed sufficiently towards 
completion such that this niinimum number of target fluorophores becomes 
annealed to the correct bead. In addition, the background, i.e., nonspecific signal 
must also be considered The experiments of Schena etaL, supra, suggest that a 

5 detection sensitivity of better than one in 1 0,000- 1 00,000 is readily achievable. 

To increase detection sensitivity, the hybridization reaction may be split into 
several parts. For example, if the 24-mer identifier tags are used, they can be 
apportioned into 100 different tubes (wells) for independent hybridization. Afl:er 
the final coupHng series of 8-mers to generate the set of one million 24-mers, the 

10 beads from each of the synthesis columns are transferred to a hybridization plate 
with 100 wells; thus each well has only 10,000 bead types, rather than one million. 
A cDNA library containing the one million tagged cDNAs is then amplified in one 
hundred parallel PGR reactions, each reaction using a different 10,000 fold 
degenerate subset of the 24-mers, The amplified library material is then dispensed 

15 into the appropriate bead-containing well for hybridization. Thus, the complexity of 
the reaction is reduced by two orders of magnitude, to increase both the kinetics of 
the reaction and the signal to noise ratio of the subsequent detection procedure, 
e.i'., where the hybridized beads are passed through a fluorescence activated cell 
sorting machine, as described below. 

20 4. Enrichment, Recovery and Analysis 

In preferred embodiments of the invention, the target nucleic 
acids are labelled with a fluorophore, and the detection and sorting process is done 
by means of a fluorescence activated cell sorter See^ wpra, Section l^.D. . 
However, the skiiied <i\hmn wiil appreciate that many other means will tulSll the 
25 same purpose. 

Fluorescence activated cell sorting machines can sort beads at a rate of 
about 1 00 mniion per hour. This is doue iii seiies, but it is so rapid that is competes 
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efFectively with procedures that can be performed in parallel. It is also possible to 
sort beads based on one criterion, and then re-sort based on another. For example, 
sorting of fluorescence intensities within a prescribed window could be carried out 
twice to improve accuracy, if necessary. 

5 The beads are forced through a nozzle, having a diameter of typically 

between 70 and 400 microns, at high pressure. Tiny liquid droplets are formed at 
the nozzle spout that occasionally contain individual beads. These water droplets 
are accelerated in one direction or another based on a droplet charge that responds 
to a variable electrostatic field across the nozzle stream. Actuation of the field 

10 automatically allows beads with particular parameters, e.g., size or fluorescence, to 
be sorted into, typically, one of three different tubes. 

As the method of the invention comprises the comparison of relative levels 
of nucleic acids derived fi-om two (or more) sources, the two target nucleic acid 
populations are typically labeled with dyes whose emission peaks are separable with 

15 the instrument. See, supra. Section VJ.D. For instance, standard ABI fluorescent 
dyes, Hexachloro-Fluorescein (HEX), 6-carboxy-Fluorescein (FAM), Tetrachloro- 
Fluorescein (TET), Tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X- 
rhodamine (ROX), 6-carboxy-2', 7'-dimethoxy-4', 5'-dichlorofluorescein (JOE), 5- 
carboxyfluorescein (5-FAM), and 6-carboxyrhodamine (Rl 10) may be used. This 

20 dye set is available commercially from the Applied Biosystems Division of Perkin- 
Elmer (Foster City, California). These and numerous other fluorophores 
compatible with DNA labeling, such as phycoerythrin, are also available from other 
commercial sources and have sufficiently different emissions spectra that a standard 
fluorescence activated cell sorting analysis can measure their intensities, and 

25 calculate a ratio, i he user can choose Ute vdiU) which pro\"ldes the most useful 
basis for sorting the beads, according to the desired parameters. Accordingly, for 
the purposes of sui ting beads based on specific characteristics of the hvbridized 
target nucleic acid, e.g., the ratio of nucleic acids labelled with ditterent 
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fluorophores, a preferred instrument is one that can determine fluorescence intensity 
in at least two wavelength channels, essentially simultaneously, as a bead-containing 
droplet passes through the laser beam on its way along the nozzle stream course. In 
addition, an "on-the-fly" computation must be performed such that the fluorescence 

5 in two channels is compared as, e.g,, a ratio of two colors. 

In addition, beads that satisfy the sorting criteria can be recovered and the 
annealed nucleic acid, suitably prepared with procedures known in the art (Hattier 
et al , 1995, Mammalian Genome 6:873-879) can be used as a template in PGR 
reactions. Optionally, the re-amplified material may be rehybridized to beads in 

10 order to provide a second (or third, etc.) round of enrichment. This aspect of the 
invention may be valuable in particular for the recovery of fragments derived from 
cDNA libraries that have been passaged through cells. See, infra. Briefly, the 
passaged cDNA fragments are quantified by hybridization to beads followed by 
fluorescence activated cell sorting based on relative fluorescence, are then re- 

1 5 amplified, and re-introduced into cells. This provides a mechanism for acbjeving 
multiple rounds of enrichment, recovery, and repassage, which allows amplification 
of differences in gene expression, and thus increases the sensitivity of the system. 

There are a variety of methods known in the art for the determination of the 
nature of the bead/capture oligonucleotide that has been recovered. Baum, 1996, 

20 Chemical & Engineering News Feb. 12 Issue :28-64. For instance, organic 

molecules may be used to tag the synthesis of combinatorial chemical reactions and 
provide the basis for subsequent reading of the beads by gas chromatographic 
detection. Alternatively, the beads may contain a radiographic bar code that 
identifies the nature of the bound material. In yet ariother appioach, the nature of 

25 tbe capture oligonucleotide sequence attached to the bead is deteiiiiliied by ?CR 
using primer binding sites of known sequence that flank the variable portion. 

In yet another alternative, it may be preferable to hj^ass determination of 
the capture oligonucleotide sequence attached to each bead, and concentrate only 
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on the target nucleic acid annealed to the bead. This can be accomplished by simply 
eluting the target sequence under conditions where a single bead can be isolated. 
This might be accomplished by limiting dilution or by specialized robotic 
aUachment. PGR using known primers that flank the target fragments permits 
5 amplification. Depending on whether or not the bound material is homogeneous to 
a satisfactory degree, it may be necessary to clone the amplified fi-agments prior to 
DNA sequence analysis. If the bound target nucleic acid is predominantly of one 
type, e.g., a single cDNA clone fi-agment, readable DNA sequence may be obtained 
immediately without an intervening cloning step. 

1 0 F. Normalizing Libraries or Populations of Nucleic Acids 

The bead hybridization methodology readily permits normalization 
of cDNA libraries. Normalization is a process to convert a cDNA library that 
represents different mRNAs in the cell according to their natural abundance, into a 
library that represents different mRNAs in roughly equal amounts. For example, a 

1 5 typical mammalian cell has about 500,000 individual mRNA molecules representing 
a total of about 10,000 expressed genes. Some genes such as actin produce large 
quantities of message, exceeding in some cases 5,000 copies per cell. Other genes, 
however, are expressed only at a low level, some as low as a single copy per cell in 
some cell types. In certain cases it is advantageous to produce a library that has 

20 clones representing at the same level all the mRNAs in a cell or tissue, referred to as 
an expression-normalized library. 

There are a variety of methods that have been used in an attempt to achieve 
library normalization Diatchenko et ai, 1995, Proc. Natl Acad. ScL U.S.A. 
23:6025 6030; Puzyreve?^/., I995,.A^g/£/o/2O:97-103: ar.d, %Oa^^<e^ai. 1994 

25 Pmc, Nati Acad, Sci. U.S.A. 21:9228-9232. Most involve competitive or 

subtractive hybridi/ation of the input mRNA used to make the libr?»ry T^^^ present 
invention provide^ laeans to transform a non-normalired JibrRTy into a normalized 
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one. The FACS/bead method proposed here oflFers a largely independent method to 
achieve normalization of libraries, which potentially gives the investigator more 
control over the end result because subsets of clones that have different abundance 
can be amplified separately and then recombined 

5 In a specific embodiment of the invention, tagged cDNA inserts, bearing 

identifier sequence tags, e.g., the 24-mers, amplified fi-om a library as described 
supra, see. Section VI.C, supra, are hybridized in solution to random-primed 
cDNA made fi-om mRNA isolated from the cells of interest. The cDN A is labeled 
with a first label, for example a fluorophore. After some appropriate time of 

1 0 hybridization under conditions that promote the formation of perfectly matched 

duplexes between the cDNA inserts derived from the library and the labeled cellular 
cDN A, the mixture is added to beads which have attached thereto capture 
oligonucleotides containing the complements of the oligonucleotides identifier tags, 
in the presence of free oligonucleotide identifier tag sequences comprising a second 

1 5 label as competitors. The second stage hybridization, under conditions that 

promote the formation of perfectly matched duplexes, is permitted to go to a high 
Cot (up to 24 hours). During this hybridization phase, the fi^ee oligonucleotide 
identifier tag sequences comprising the second label compete with the cDNA 
inserts, which are indirectly labeled with the first label through the cellular cDNA 

20 used during the first hybridization, for hybridization to the appropriate capture 
oligonucleotides attached to beads. The ratio of first and second label reflects the 
abundance of particular mRNA sequences in the original cells. The label attached 
to the competing free oligonucleotide identifier tag sequences provides a means to 
control the amount of capture oligonucleotide on the bead, Le,, it pei rniis a 

25 comparison to be made, instead of an absolute measurement of feoies^cence For 
example, an abundant transcript such as actin will be identified by a large 
first^second label ratio on a bead that contains an actin cDNA clon^ attached via itc 
identifier tag. A weakly txpi essed sequence is identified by a small firsL'scecnd 
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label ratio. If fluorescent labels are used, e.g,, HEX and FAM, the population of 
hybridized beads are sorted by fluorescence activated cell sorting for prescribed 
first/second label ratios into particular bins, each bin representing cDNA clones 
derived from transcripts with a particular level of abundance. cDNA clones from 

5 particular bins are amplified to a particular level. After amplification, the cDNAs 
from each bin are re-mixed. This process resuhs in heightened representation of 
weakly expressed sequences, and suppressed representation of abundant mRNAs. 
Altogether, the process produces normalization. 

In another embodiment of the invention, a similar normalization procedure is 

10 carried out with cDNA clones representing the 3' ends of cellular transcripts. This 
results in a set of 3'ESTs, representing, theoretically, all transcribed genes in a 
particular cell or tissue. These EST tags may be used in subsequent experiments to 
monitor gene expression levels. For example, if clones prepared from the 
normalized 3 'EST library are gridded out into 96- well trays and amplified 

1 5 individually by PGR, 10,000 such PGR reactions on 10,000 independent clones 

would produce a set that represents a large fraction of all 3' ends in the cell. If these 
are attached to beads, the beads may be pooled and used in hybridization 
experiments and, e.g., fluorescence activated cell sorting analysis is used to 
determine expression profiles of genes in particular cells or tissues. 

20 The collection of 3* ESTs generated in this fashion can also serve as a 

substrate for DNA sequencing directly, permitting EST comparisons to be made 
between cell types or tissues with the minimum sequencing redundancy. 

Determination Of Relative mRNA Levels In Cells 
Transcript levels in a cell are a meaningful in d; cation of p^c^c act- vit}', 
25 in establishmg a "molecular phenotype" of the cell. Mutations of certain genes may 
alter the expression pattern of other genes, and thus the molecular and nos<;ih1y thp 
physioiogicai phenotype of the ceiL which may result in severe natholosical 
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conditions, such as cancer. Therefore, information about relative transcript levels of 
specific genes in a cell is very valuable. However, measurement of transcript levels, 
though straightforward in the case of a few genes at a time, is, with currently 
available methods, a challenging task for large numbers of genes. 

5 In some instances it may be even more valuable to obtain comparative 

expression information fi-om genes in two or more different cell types, not simply 
relative expression levels within one cell type. For instance, when two cell types, 
e.g., a tumor cell and a normal cell, are compared, it is less interesting to focus on 
genes whose expression is unaltered, but of great potential significance to define 

1 0 genes whose expression is altered between the two cell types. The present 
invention provides a convenient mechanism for achieving this goal. 

In an embodiment of the invention, comparison of the mKNA levels in 
different cell types, e.g., a tumor and non-tumor cell is accomplished essentially 
with the procedure described for library normalization, supra. However, instead of 

1 5 ir.c!i:dir!3 s labeled free oligonucleotide identifier tag sequence for ratio 

comparisons, cDNA comprising a first label, derived from the tumor cell, is mixed 
with cDNA comprising a second label, derived firom the normal cell, and hybridized 
during the first stage with identifier sequence-tagged cDNA library clones. The 
second phase of hybridization involves annealing of the tagged cDNAs, plus 

20 hybridized labeled cDNA, to the beads having attached thereto complements of the 
identifier tags as capture oligonucleotides. The beads are sorted, e,g,, where 
fluorescent labels are used, by fluorescence activated cell sorting analysis, to 
identify beads that have an unequal first/second label ratio. Such beads are 
collected, optionaiiy re-sorted and/or rehybridized, and the attached cBNA insert 

In another embodiment, comparative quantitation of mRNA levels in two 

„„n :^ „«u;r>,»^^ ttnl^ry UnnAr, Uoxnnrr ofta/^VioH thorMrs r^inHom o'i^^nnnrlpOtideS 
as CdplUie Oii^Oiiuv./iCuuuL;a, \ji\^k\^iauiy u iwngLn iu^a^iaa^ w*,** \- - / j 
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(20) nucleotides. In most preferred embodiments, 15-mers are a useful compromise 
between the total complexity of the sample, /.e., (4)*^ = l.lx 10^, and the melting 
point (Tm) of the duplex that can be formed. Specifically, the complexity of 1 5- 
mers is very high, Lc, roughly one billion (1.1x10^) different 15-mers, while the 

5 melting point of about 45 °C (depending on the base composition) allows 

hybridization at reasonably stringent conditions. If a target mixture of nucleic acids 
composed of similar or less complexity is exposed to beads that contain random 15- 
mers, each bead on average should hybridize to at least one target species. Given 
that an average manmialian cell contains roughly 10,000 active genes, each with 

10 about 2,000 nucleotides of unique sequence, the complexity of this population is 
about 20 million bp. If a random subset of the billion fold complex beads 
numbering two million is chosen, every target sequence of average length 500 bp 
should hybridize to one among the two million beads. Each 1 5-mer is expected, 
under certain conditions, to preferentially hybridize to specific sequences that are 

1 5 present in a complex target nucleic acid mixture cDNA is prepared from the two 
sources to be compared, one cDNA sample is labeled with a first label, e.g., HEX, 
the other is labeled with a second label, e.g., FAM. The two cDNA populations are 
pooled and subjected to hybridization with beads having attached thereto the 
random capture oligonucleotides, e.g., random 15-mers. After hybridization to high 

20 Cot, the beads are washed and passed though a fluorescence activated cell sorter. 
Specifically, the beads are sorted based on HEX>FAM and FAM>HEX. All 
comparisons are internal, involving only fluorescence intensity ratios, not absolute 
intensities. If the labeled cDNAs have been prepared such that they contain PGR 
primer sites on both ends, the beads can be retrieved and the bound cDN A can be 

25 amplified, (pcssibiy cloned) and scqueTiGed. 
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H. Post-Passage Libraiy Comparison 

In a preferred embodiment, the methods of the invention are used to 
compare genetic libraries that have been grown in different host cells. Similar to the 
type of comparative analysis described in Section VI,K, supra, the methods can be 

5 employed to determine, for example, the eflfects of a particular mutation or 

alteration in a cell, or of agents that cause such a phenotypic change. Provided that 
the agent (termed "perturbagen") can be encoded by DNA, the bead hybridization 
technology allows isolation of the relevant causative agent. See, U.S. Patent 
Application Serial No. 08/699,266, filed August 19, 1996, incorporated hereby by 

1 0 reference in its entirety. 

More specifically, a gene library, constructed in a vector that allows 
expression in the host cell types of interest, is introduced into one or more cell 
types. The host cells are permitted to grow for several divisions. Subsequently, the 
gene library is re-isolated using one of several possible procedures including PGR, 

15 see, supra, and biochemical enrichment is performed. This enrichment allows 

sequences that have been lost from one of the propagated libraries to be selectively 
amplified compared with sequences shared in common. Multiple rounds of library 
propagation, isolation, and biochemical enrichment may be required to achieve 
purification of the relevant differences in the library. This approach provides the 

20 means to identify specific sequences that are selectively lost fi-om a library during 
propagation on particular host cells. Such differences are candidates for genes, 
gene fragments, or random sequences, depending on the library type, that cause 
arrest or cell death in a particular host cell or selective growth enhancement. 
Comparing sequences, referred to as "post-passage libraiy eonipaiison", permits 

25 those sequences that cnusc selective cell death or stasis bi one cell type and not 
another to be recovered. 

Choice of library and library size pre imnnrt?»nt f?rtor<: if pndnaenous gene 
or gene trapment sequences a^e preferred, the libraries must be constructed from 
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genomic DNA or cDNA prepared from the prospective host cell itself If random 
sequences are desired, libraries need to be constructed that contain such inserts. It 
must contain enough independent clones to ensure that the relevant sequences will 
be contained in it. The library must propagate efficiently on, or be able to establish 

5 itself inside, the chosen host cells. 

The characteristics of the cells used to propagate the library are also 
important, since sequences v^ll be recovered from the procedure that affect the 
particular host cells and perhaps not others. This trait may be used to advantage so 
that library comparisons are made between the same library grown on different host 

10 cells. This permits recovery of library sequences that are, e.g., selectively lost from 
one host and not the other. 

The problem of genetic drift also has to be considered. As libraries are 
propagated, random fluctuations in sequence representation will occur, a 
phenomenon akin to genetic drift in isolated populations of interbreeding organisms. 

1 5 Such random differences will introduce a tjn^e of noise into the process that may 
limit its effectiveness in isolating relevant sequences from the libraries that are lost 
during passage. 

The degree of enrichment, i.e., the enrichment factor, during each step is an 
important variable. The extent of enrichment determines the number of cycles that 

20 must be performed before the sequences of interest can be recovered from the 
libraries. Enrichment occurs during two steps in each cycle; at the level of grov^^h 
of the library on the host cells, and during the biochemical selection for differences 
that have appeared in the two libraries being compared. 

The number of host ceii doublings is also important. In certain cases, it may 

25 be de*iirab:e to lim*it the number cf host ceii cioiibiinjs to nvotd, for cxnr^ipic, 
extensive genetic drift. In other cases, it may be helpful to prolong library 
nr/^norroti/^Ti thzt diSFsrencss become dcccntuntcd. 
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Mutations occurring during the library propagation have also to be 
considered. Mutations may occur in library sequences either as they propagate in 
the host ceils, or as they are isolated following propagation, particularly if PGR is 
used in this isolation process. Such mutations may limit the sensitivity of the 
5 comparison, because a mutant sequence that continues to propagate where the 
original sequence did not, may, if it remains similar enough in sequence to the 
original, confound or interfere with the biochemical enrichment steps. 

The number of cycles is yet another important factor. The process of library 
propagation, re-isolation and biochemical selection could be repeated muhiple times 
1 0 to achieve sufficient enrichment. This is a variable that needs to be determined 
based on other factors such as genetic drift, degree of enrichment per step, and 
mutation rates. 

Gene Libraries, Gene libraries, usually cDNA or genomic, can be 
constructed in a variety of vectors including plasmid and viral vectors by methods 

1 c ,,,yiii_^/;+«ut;c^'>'^ «-v^2*^'^ c*^cr references Seiribrcck ct , ^jpTC. 

The library vectors can be designed to propagate on one or more of a variety of cell 
types including bacteria, yeast, or mammalian cells. In some cases the libraries are 
intended to be as representative of the nucleic acids present in a particular organism 
or tissue as possible. These are termed total genomic or cDNA libraries. In other 

20 cases the libraries are intended to contain only a subset of sequences; for example, 
those sequences that are prevalent in one cell type and absent in another. Such 
limited libraries can be constructed using, for example, cDNA from one source that 
has been treated with subtraction or blocking procedures as suggested above to 
remove sequences held in common with a second source. See, supra. 

25 Libraries have triidliionally been uttd h\ two ways, for biochenircdl screeiib 

and for genetic screens. The process of screening allows isolation of sequences of 

either a nucleic acid probe or a piotein piobe such as an antibody (in the case of 
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cxpression libraries). Specific genes or gene fragments can be fished out of a library 
using an appropriate probe. Genetic screens permit recovery of sequences from a 
library of genes or gene fragments which complement or rescue a particular mutant 
phenotype using an appropriate selection scheme. For example, if a yeast genomic 

5 library is introduced into HIS3-yeast cells and plated on media lacking histidine, 
only cells that have acquired library sequences that contain a functional HIS3 gene 
will be able to grow These growing colonies can be treated such that the resident 
library sequences are recovered. 

A number of ways can be envisioned to enrich and identify differentially 

10 expressed library members. For example, Representational Difference Analysis 
(RD A) permits the purification of sequences that differ substantially between two 
samples because, e.g., they contain a restriction fragment length polymorphism. 
RDA and similar methods are currently being used by commercial and academic 
research groups to identify resident pathogenic genomes and interesting lesions in 

15 tv!r"o^s For ex^-ple, RDA was used to identify a homozygous deletion in a 

pancreatic xenograft which proved to include the breast cancer susceptibility gene 
BRCA2, Schutte^/a/., 1995, Career 55:4570-4574. However, the resolution 
of RDA is rather limited; in addition, the method is not exhaustive, as it is subject to 
the inherent biases of PGR, including the tendency of certain fragments to dominate 

20 the amplification process. 

A second approach is to use selective PGR amplification of sequences that 
are not held in common between two clones isolated from the same library, for 
example as described by Glontech, Inc., Palo Alto, GA. Alternatively, biochemical 
ennchments may be used that involve solution hybridization foiiowed by selective 

?S p^ysicnl scpfirfition of hybr;ni?^od scquer^cc^; usinr^ for example, biotinyiaten DNA 
and avidin beads. 

piOvided uy tlic nicthods of thc present invention. For example, if a library of 
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cDNA fragments (tagged with identifier sequences) is introduced into two cell types 
and the cells are allowed to grow for several divisions, the library can be reisolated 
from each cell type and the individual clones from each library can be compared 
using the beads. PCR amplification of the sequences carried by the two ceil types 

5 allows amplification of the individual clones, and labeling v^th, e.^., HEX and FAM 
separately such that one post-passage library carries HEX and the other carries 
FAM. If these passaged libraries are hybridized to beads and analyzed by 
fluorescence activated cell sorting, cDNAs can be recovered that are over- 
represented or under-represented in one or the other cell type. For example, a 

10 specific cDNA clone that is over-represented in one cell type compared with the 
other cell types is a candidate for a sequence that selectively causes the first cell 
type to grow. The cDNA is also a candidate for a sequence that causes selective 
death or growth arrest in the second cell type. These interesting candidates can be 
studied fiirther after their identification. 

1 5 L Data Management 

As with any high throughput method capable of collecting a large 
body of information rapidly, data management is an important issue. With the 
invention described herein, the major types of information will be related to 
expression profile, DNA sequence, fluorescence intensity, and indirectly, effect of 

20 the sequences on cell growth. The data obtained may be conveniently handled 
using standard relational or spreadsheet data formats. In addition, in many cases it 
will be useful to search with each newly obtained sequence against local databases, 
t.c , against sequences identified througli noii-public experiments, and against piobal 
(mf.abases, databases derived from the eSbiiS of sequencing the h;:Tn?:r' 

2S genome. Sequence matches will allow extension of sequences obtained using the 
present invention, as well as. in some r?*<ips, con-e'ntion of an unloncv/n Gcqucncc 
with a icnown eene. 1 he "intensity" information can be used as a substuute fui 
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expression level or relative abundance of a particular nucleic acid sequence in a 
library. 

Specialized tools can be envisioned to visualize the data that are obtained 
from the present methods in order to interpret the patterns of gene expression and 

5 the spectrum of biological effects that particular sequences exert in specific cell 
types. For example, such tools may involve multiple pairwise comparisons, or an 
, averaging or summation method that depicts the cumulative results of several 
experiments in order to identify those nucleic acid sequences that are either most 
frequently altered in expression, or exert the most fi-equent or largest effect on cell 

10 growth. Many databases, sequence analysis packages, searching engines, and 
graphical interfaces are available either commercially or free over the internet. 
These include the Genetic Data Environment (GDE), ACEdb, and GCG. In many 
cases, off the shelf solutions to specific problems are available. Alternatively, 
software packages such as GDE readily permit customization to solve particular 

1 5 problems in sequence analysis, data storage, or data presentation. 

J. Quantitation Of Genomic DNA Fragment Ploidy 

In certain situations, it is usefiil to determine the ploidy, i.e., the 
copy number, of specific chromosomal regions or loci. For example, cancer cell 
regions that contain heterozygous deletions (LOH) or homozygous deletions often 

20 include tumor suppressor genes that are involved in the negative regulation of cell 
growth. In contrast, regions that contain DNA amplifications or translocations 
frequently contain oncogenes, i.e,, genes that promote cell growth. Thus, the 
boiiridaries of aneuplcid chrcmcGcmai regions can be used to localize genes that are 
irivoived in tumor progression. 

25 Several methods have been used previously to localize regions of 

aneuploidy. These mclude cvtogenetics Rowlev ^990, rrmrer Re? 5Q:38!6 3S25, 
fluorescence m situ hybridization (FISW) van Dekken ei aL, 1990. Car.ccr 66:-91- 
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497, Comparative Genome Hybridization (CGH) Kallioniemi et ai, 1992, Science 
258:818-821, genotypic analysis using Restriction Fragment Length Polymorphisms 
(RFLPs) Botstein et al, 1980, Am. J. Hum. Genet. 32:314-331, Variable-length 
Nucleotide Tandem Repeats (\OTRs) Boerwinkle et al, 1989, Proc. Natl. Acad 
5 Sci. U.S.A. 86:212-216, or microsatellite repeats Weber, 1990, Curr Opin 

Biotechml 1: 166-171, and RDA Lisitsyn et al., 1995, Methods Enzymol 254:291- 
304. 

Cytogenetics, FISH, and CGH all utilize whole chromosomes mounted on 
solid supports such as glass slides. The combination of visible dyes or fluorescent 

1 0 dyes with microscopy permits identification of regions that contain gross 

chromosomal abnormalities such as LOH and amplification. In the case of CGH, 
much of the analysis has been automated. The weakness of these approaches 
primarily involves the level of resolution. Only lesions that are of considerable size, 
typically at least 10 megabases, can be detected with, e.g., CGH. Thus, smaller 

1 5 lesions, i.e., the vast majority o£ e.g., homozygous deletions, are not detectable 
Genotyping via RFLPs, VNTRs, or microsatellites involves a comparison 
between tumor DNA and normal DNA from the same individual of polymorphic 
markers located at specific sites within the genome. If the relative intensities of two 
alleles at a particular marker locus differ significantly between the tumor and normal 

20 sample, the locus is considered to be aneuploid. If cell lines are used, such 
comparisons are often not possible. However, homozygous deletions can be 
detected easily by the failure of particular sequences within the deletion to amplify. 
These methods suffer fi-om the drawback that a great deal of labor is required to 
achieve high resolution, tor example, if a genome wide search for aneuploidy is 

25 ;5T;ricrTakcT; r^t ton (iO) rr^cgahasc rc??oiurion, a Tnini-iiTr; of 300-500 rmru;cr^ 
required. 

on A ;p n Dr'p u^^rr^A r.^^^^ — u +u..* w«« c^n iincd tc detect RFLP3, zcxr.z cf 
which prove to be sites cf aneupioidy in a turuor sample. The approach has been 
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especially eflFective in isolation of fragments derived from homozygously deleted 
regions Schutte et aL, 1995, CmicerRes. 55:4570-4574. The approach involves 
hybridization between restriction enzyme-digested, PCR-amplified "driver" tumor 
DNA and "tracer" normal DNA. Sequences shared between the two samples are 

5 removed as potential PCR templates by formation of hybrids between tumor and 
normal DNAs. These hybrids are treated so that they fail to amplify in a subsequent 
PCR step. Only sequences from the tracer sample that are not shared with the 
driver DNA can be amplified. After muUiple rounds of hybridization and PCR, such 
unique fragments emerge as individual products that can be visualized on gels and 

1 0 cloned. The weakness of RDA is that of necessity it involves a step to reduce 

complexity of the total genomic DNA mixture, i.e., the first PCR step, thus limiting 
the resolution of the process. In addition, the method is technically demanding and 
subject to the inherent biases of PCR, including the tendency of certain fragments to 
dominate the amplification process. 

1 5 The present invention provides a solution to many of the inherent 

weaknesses of the currently available strategies for isolation of aneuploid 
chromosomal regions. Specifically, the beads having attached thereto capture 
oligonucleotides or nucleic acid fragments are used to bind individual genomic 
DNA sequences, labeled to permit quantitative comparisons of DNA content 

20 between two samples. Several specific procedures to accomplish this task can be 
envisaged. One approach involves generation of a germline genomic DNA library 
by shearing genomic DNA to an average size of about 500 bp. These fragments are 
attached to linkers that contain identifier tags, and inserted into an appropriate 
phage or plasmid cloning vector For a hi-iman genome sized library, for cxampte, a 

25 toiai of about 6 million clones are leqiuted A_n equivalent ni:mbcr cf beads w^th 
cognate identifier sequence tag complement oligonucleotides are also needed. 
Hybridization of the beads to the genomic library permits the individual clones to he 
spread out oae ly uiie over the set of beads, l nese genomic fragments rpn t^^en be 
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hybridized in a second round to a mixture of two genomic DNA samples each 
labeled with a different fluorescent dye (the order of these two hybridization 
reactions could be inverted). Fluorescence activated cell sorting analysis permits 
recover>' of beads that have bound a ratio of dye molecules that deviate significantly 

5 from unity. The fragments of library genomic DNA, Le., library inserts originally 
prepared so that they have PGR primer sites, for analysis, bound to the beads can be 
eluted from the beads and amplified by PGR. These fi-agments can be aligned to the 
human physical map either based on their DNA sequence or by additional PGR 
experiments. Thus, the positions of LOH regions, homozygous deletions, and 

1 0 amplifications can be defined. 

K. Comparison Of Promotor Activity 

An alternative method for assessing gene activity encompassed in 
this invention involves the assessment of promoter activity in specific cell types. 
Specifically, genomic library fragments are identified which drive expression of a 

1 5 reporter gene in certain cellular environments. Such an approach permits an 
indirect fianctional analysis of the transcriptional factor milieus of different cells. 
This strategy is based on the fact that genes can be activated by promoter fusions, 
/.e., insertions, typically upstream, of transcriptional activation sequences that 
induce transcription of adjacent genes. 

20 In the specific formulation of the strategy relevant to the invention described 

herein, a genomic library with inserts ranging from a few basepairs to several 
kiiobasepairs is inserted into a vector such that each of the derived clones in tlit; 
iibrar/ has an sequence identiuet 1^*3 attached. The size of the library can vary, but 
most typically wiii not exceed ten (10) million independent clones. The identifier 

25 tags are located between a poly(A) addition site and a reporter sequence that 

produces a staule iiduaCiipc. The iibran/ is introduced independently into two eel! 
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populations. These cell populations may represent diflFerent cell types, or may be 
derived from the same cell type, where one population has been treated differently, 
e.g., with a small molecule compound under study. The cells are allowed enough 
time to express the introduced library sequences prior to harvesting and conversion 

5 of cellular RN A into labeled cDNA. In general, only genomic DNA sequences 
capable of inducing RNA expression of the reporter sequences, /.e., promoters, will 
produce significant amounts of transcript that can be detected subsequently by 
hybridization to beads. Because the cDNAs from the two samples are labeled with 
different dyes, the ratio of signal intensities emitted by the two dyes can be used to 

1 0 identify genomic sequences that are differentially active in the two cell populations. 
These differences may reflect disparities in the active transcriptional machinery in 
particular cell populations. Such differences may be useful in assessing, for 
example, the degree to which a particular stimulus or agent affects a particular cell 
type, especially in a differential manner compared to another cell type. Such 

1 5 differences may be indicative of potential side effects that a drug candidate may 
produce. The technique may also allow recovery of promoter sequences that have 
differential activity in two cell types or tissues, an achievement that has relevance in 
gene therapy, e.g., for the targeting of gene activity in specific cell types. 

The below examples explain the invention in more detail The following 
20 preparations and examples are given to enable those skilled in the art to more 
clearly understand and to practice the present invention. The present invention, 
however, is not limited in scope by the exemplified embodiments, which are 
intended as illustrations of single aspects of the inveoticn only, and methods which 
are fMJiCiiunaiiy equivalent arc withm the scope of the invention. Indeed, various 
25 mcdificaLions of the invention in addition to those described herein will become 
apparent to those skilled in the ait from the foregoing description and 
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accompanying drawings. Such modifications are intended to fall within the scope of 
the appended claims. 

VII. EXAMPLES 

A. Example 1: Synthesis of Capture Oligonucleotides on Beads 
5 Using Base-Stable Chemical Linker 

This example illustrates the chemical synthesis of capture 

oligonucleotides on the surface of beads such that the resulting capture 

oligonucleotides were covalently joined to the bead surface via their 3' ends and do 

not dissociate from the bead in the presence of base concentrations sufficient to 

1 0 remove deprotecting groups from the bases. 

Polystyrene beads of diameter 30 microns in diameter derivatized 
with primary amines were obtained from Pharmacia and exposed to standard 
coupHng chemistries in an ABI 394 DNA synthesizer (Applied Biosystems, Foster 
City, California). The initial coupling step involved the attachment of a 

! 5 phcphoramidite base to tlie bead via nucieophilic attack of the primary amine. This 
linkage was oxidized to a phosophoramidate by treatment with molecular iodine. 
The phosphoramidate linkage was base stable and the beads were now treated in the 
same manner as resins used during standard oligonucleotide synthesis in terms of 
reagents and cycle times. The extension products were stable and the beads can be 

20 used for hybridization as illustrated in subsequent examples. 

B. Example 2: Sorting Of Beads Using A Fluorescence-Activated 
Cell Sorter 

This example illustrates the sorting of nucleic acids captured by 
beads vAih a flucrcsccr-ce acilvated ccii sorter. 
25 Nucleic acid pools derived from two different sources are labeled with two 

different fluorophores^ one with HEX, the ether one with FAIvL The beads with 
covaiently attache/i capture chgcnuclcotidcs aie liybtidized using stnngent 
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conditions to equal amounts of nucleic acids derived from the two different sources. 
More specifically, 100,000 beads containing on their surfaces roughly 10-100 
million copies per bead of a random 15-mer sequence are placed in 100 ^1 of 
hybridization buffer (2x SSPE, 0.1% Triton) along with equal amounts of FAM- 

5 labeled cDNA and HEX-labeled cDNA firom different sources, and heated to 95°C 
in a thermocycler (MJ Research) for 2 minutes. The mixture is cooled to 40°C and 
left to hybridize for 24 hrs. The sample is then washed three times at room 
temperature in Ix SSPE, 0.1% Triton, followed by resuspension in 1 ml of PBS. 
The hybridization reaction can be scaled up to include more beads, e.g., 2-5 million. 

1 0 Subsequently, the beads are sorted using a fluorescence activated cell sorting 
machine in order to identify those which are labeled with an excess of HEX or an 
excess of FAM. FIGURE 1 shows the capture oligonucleotides attached to the 
bead surface as black squiggly lines. The gray (F) and black (H) lines represent 
chromophore-labeled cDNAs from two different sources. 

1 5 C. Example 3: Sensitivity Of The Oligonucleotide-Conj ugated 

Beads: Signal/Noise Ratio 

The following experiment shows the sensitivity of the 
oligonucleotide-conjugated beads in hybridizations and fluorescence activated cell 
sorting analysis. As depicted in FIGURE 2, the signal/noise ratio was as low as 

20 1000:1, calculated by dividing the saturating fluorescence at 60 \iM by the 
background autofluorescence. 

50,000 beads were used having attached to their surface an estimated 1-lOx 
10^ copies of capture oligonucleotide COl per bead. The hybridization conditions 
were as foyows: The 50,000 beads in iOO hI of 2x SSPE. 0.1% Iriton were mixed 

25 with the complement of CO 1 (CCO ! ), w hich was labeled with FAM at the 

indicated concentrations (FIGURE 2) and the sample was heated to 93 X for 3 
minutes, followed by annealiw at to 55^*0 for 15 minutes. The beads weie then 
pelleted and washed 3 times in 70=C Ix SSPE, 0 \% Triton to remove the unbound 
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labeled CCOl . Finally, the sample was resuspended in PBS and analyzed on the 
Becton Dickenson FACScan Flow cytometer (Becton Dickenson, San Jose, 
California). 

FIGURE 2 shows a histogram of the number of events, lc, beads, plotted 
5 against the fluorescence intensity. The labeled peaks represent beads that have been 
hybridized overnight with the chromophore (FAM)-labeled complementary 
oligonucleotide at different concentrations ranging from zero (0) (background) to 
100 ^M. 

D. Example 4: Sensitivity Of The Oligonucleotide-Conjugated 
1 0 Beads: Range Of Sensitivity 

The following experiment shows that 1% specific beads can be 

distinguished from the 99% nonspecific, unhybridized beads by a fluorescence 

activated cell sorting instrument. As depicted in FIGURE 3, the sensitivity of the 

technique is sufficiently high that a target concentration of between 400 pM and 4 

1 5 nM can easily be detected above the background ("beads only"). 

In this experiment, two populations of oligo-conjugated beads were mixed 
prior to hybridization. One population contained the specific oligonucleotide, while 
the second population, present at a 100-fold higher concentration, contained a 
different, unrelated oligonucleotide. Capture oligoucleotide 1 (COl) was directly 

20 synthesized on 1% of the beads and C02 on 99% of the beads. Both COl and C02 
were 20 base oligonucleotides. The sequence of COl was: OCT GCA TAA ACC 
GAC TAC AC [SEQ ID NO: 1], and is derived from the Kcoli LacZ gene 
sequence The sequence of C02 was also derived from LacZ: GCA TTA TCC 
QAA CCA rCC GC i SEQ ID N0;2]. The beads were estimated to contain on 

25 average about 1 x copies of each sequence on their surfaces. The conditions of 
hybridization were as follows" 100,000 total beads v/ere mcubated in the presence 
of the indicated concentration of coTTinlementarj' CCOl, labeled v.-ith ¥A}A, in 2x 
SSPE, 0.1% Triton. The 100 ui reaction was heated to 9*>X for 3 minutes and 
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then hybridized at 55 X for 15 hours The beads were pelleted by centrifiigation 
and the supernatant containing the unbound fluorescent oligo was removed. The 
pelleted beads were washed three times with 500 ^1 of 70''C, IX SSPE, 0.1% 
Triton The beads were resuspended in 600 n! PBS before analysis on a Becton 
5 Dickenson FACScan flowcytometer (Becton Dickenson, San Jose, California). 

FIGURE 3 shows a histogram of the number of events, /.e,, beads, plotted 
against the fluorescence intensity. The labeled peaks represent beads that have been 
hybridized overnight with the chromophore (FAM)-labeled complementary 
oligonucleotide at different concentrations. 

10 E. Example 5: Sensitivity Of The Oiigonucleotide-Conjugated 

Beads: Determination Of Background Noise 

The following experiment is essentially the same as in Example 2, 

except that a high concentration (100 ^M) of nonspecific target oligonucleotide, 

unrelated to the oligo sequence on the beads, was included in the hybridization. 

1 5 This permits an assessment of the background noise caused by nonspecific nucleic 

acids in the experiment. As depicted in FIGURE 4, the signal/noise remains high 

even in the presence of a roughly 100,000-fold excess of nonspecific sequences. 

F. Example 6: Sorting Beads Based on Fluorescence Intensity 
Ratios 

20 The following example shows how fluorescence intensity ratios of 

two different fluorophore labels can be used to sort beads into distinct populations, 
each population having a defined intensity ratio. 

A Becton-Dickenson "FACS Vantage" ceil sorter was used with "Cell 
Quest" soiLwai-e and an aigon laser (Becton Dickenson, San Jose, California) to 

25 excite FAM and HEX dyes attached to oligonucleotideb captui ed by beads 

conjugated with ccmplcmcntar}' cligGriUclcotidcs. Two fm^rs wci^ used, a 530 t/- 
1 5 nm filter to detect FAM emission and a 585 +/- 21 nm filter to detect HEX 
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emission. 40,000 beads conjugated with LacZ2RA' oligonucleotide (sequence CC 
GAG TGT GAT CAT CTG GTC [SEQ ID N0:3]; roughly 1-lOx 10^/bead) were 
exposed in a 50 \i\ volume of 2x SSPE, 0.1% Triton solution to oligonucleotides. 
Various ratios of HEX- or FAM-Iabeled LacZ2RA' oligonucleotide including 

5 FAM:HEX of 100:0, 90:10, 75:25, 50:50, 25:75, 10:90, 0:100. The combined 
concentrations of the labeled oligonucleotides was 4 in all samples. The 
reaction solution was heated first to 95X for one minute, and allowed to anneal at 
30°C for 10 minutes. Every 90 seconds the samples were vortexed. The beads 
were then washed 3x at room temperature in 1 ml of Ix SSPE, 0.1% Triton. The 

10 beads were then resuspended in 1 ml- of PBS, 0.05% Triton at room temperature 
prior to fluorescence activated cell sorting analysis. 

Detectors on the fluorescence activated cell sorting machine were optimized 
using the beads labeled with FAM:HEX 100:0 and 0:100, and the "ratio sorting 
gates" using beads labeled 50:50. After fluorescence activated cell sorting 

1 5 optimization, the beads were mixed and passed through a 62 urn mesh to e?ir^i"-ate 
bead doublets that clog the 70 jim sorting tip. Approximately 10,000 beads in sort 
gates R2 and R3 were collected and then rerun on the scanner to demonstrate 
sorting efficiency. 

Panel A of FIGURE 5 shows the mixed population of beads shows that all 
20 seven bead subpopulations can be seen as distinct clusters. Panel B of FIGURE 5 
shows the F AM/HEX fluorescence ratio of the mixed population of beads and the 
R3 gate used to sort the beads of interest (see. Panel Bl). This ratio provides 
resolution of the beads that have HEX>FAM. Panel B2 shows the R2 gate used to 
sort beads of interest aiid the HEX/FAM ratio provides resolution of beads where 
25 FAM>.1IEX Pai-e! C of FIGURE 5 shows v^-o beads tTiat were sorted uiiing Fo 
sort gate in Panel Bl were re-nin on the sorter to demonstrate that only the beads 
of interest we^e roHected. Par.e! D cf FIGURE £ shovvs the bcada tliat wcie suiied 
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using the R2 sort gate in Panel B2 and were re-run on the sorter to demonstrate 
that only the beads of interest were collected. 

G. Example 7: Pool And Split Synthesis Of Random Oligomers 
The following example shows the pool and split synthesis strategy 

5 for the generation of random oligomers (N-mers). 

As depicted in FIGURE 6, after an initial round of base couplings in four 
separate synthesis columns, the resins from each column are pooled and 
redistributed (split) equally into four new columns. The mixing process is 
completed after each new round of coupling to generate random N-mers, where N 

1 0 is the length of the oligonucleotide. 

H. Example 8: Pool And Split Synthesis Of 24-mers 

The following example illustrates the concept of the "pool and split" 
synthesis strategy for the synthesis of 24-mers comprisir^s 3 ur-iqiie 8-rr.ers in 
tandem. 

1 5 To synthesize 24-mers that are roughly one million-fold degenerate, a 96- 

well format is used. After 8 rounds of coupling, each well (or column) has obtained 
a unique 8-mer sequence; the contents of the 96 colunms are pooled, mixed, and 
redistributed (split) into another 96 colunms for a fiirther 8 rounds of base coupling. 
The process is repeated again to generate the final 24-mers. See, FIGURE 7. For 

20 clarity only one recipient well of each run and 8 donor wells are shown as being 
mixed. 

L Esa^i^piS 9: Synthesis Of Seq^^er.ce Meri?:§a=er Tags 

The following example describes the synthesis of sequence identifier 
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Two strategies are used to capture specific sequences from a complex 
mixture of nucleic acids. The first involves use of random (or a biased subset of 
random sequences), eg,, 15-mers attached to beads. In practice only about two 
million of the total one billion possible 15-mers need be used. These 15-mers will 

5 bind to sequences present in the target population of nucleic acid (usually cDNA) 
based on the likelihood that a given sequence contains a particular 15-mer 
complementary sequence within its bounds. The cDNA is typically generated by 
random priming mRNA, with an appropriate primer. The beads do not interact with 
the primers, but rather with unique sequences within the cDNA itself 

1 0 An alternative strategy involves hybridization of bead-conjugated 

oligonucleotides to cDNA complementary to the 3' ends of mRNAs. In this 
approach, the beads contain a stretch of A residues (e.g., 15 A's) followed by a 
stretch of random or pseudo-random sequence (e.g., 10 residues of random 
sequence). Target cDNA is prepared by oligo-(dT)-priming and is labeled with a 

1 5 fluorophore. When this cDNA is hybridized to the besds at high stringency the 
unique 3' cDNA sequence adjacent to the oligo-dT stretch finds its complement 
among the unique 1 0 basepair sequences adjacent to the oligo-dA stretch on the 
bead. Thus, the specificity is determined by the unique sequence, but the 
hybridization and washing temperatures can be relatively high, e.g., 60-70 °C. In a 

20 preferred embodiment of the invention, oligonucleotides comprising a stretch of 
fi-om about 5 to about 25 adenosine residues at the 3' end, and a stretch of fi*om 
about 8 to about 16 nucleotides of random sequence at the 5' end are attached to 
solid supports such as beads. 

A different strategy involves priming of the mKNA with a mixture of 24- 

25 niers (one iiiilKoii-fbld degenerate in totc^O The primers also nave a cori:>tant region 
(linker) at their 5' ends and a random N-mer {e.g., hexamer) at their 3' ends for 
rpndom nrimintr rDNA clones generated by this method cari be captun^u itnuugii 



wo 98/26098 



PCrAJS97^173 



-65- 

the 24-mer sequences that they carry from the original priming event that produced 
them.. FIGURE 8 shows this use of sequence identifier tags. 

The choice of primer sequences can be made based on a simple algorithm 
implemented on a computer. Random 8-mer sequences can be generated with a 

5 variety of constraints. For a given set of, e.g., 1 00 sequences, each 8-mer that is 
generated by computer can be examined for G/C content and secondary structure. 
Sequences that have unacceptable G/C content (e.g., this might be simply any 
sequence that is not 50% G/C), secondary structure potential (e.g., any sequence 
that has self complementarity of greater than 3 consecutive bases) can be rejected. 

10 Of the roughly 64,000 possible 8-mers, there are 17,920 that contain 50% G or C 
residues. Therefore, the computational problem is reduced to searching this set for 
those that are mutually compatible according to the criteria that they are minimally 
cross-hybridizing and have minimal secondary structure. This problem can be 
solved in a variety of ways known in the art. Most importantly, the sequences are 

1 5 chosen so that they differ maximally in primary sequence from one another; 

there are no stretches of identity that extend beyond 2-3 bases among the set of 
1 00. Applying these constraints on the choice of 8-mers produces a set of 100 
sequences predicted to be optimal as identifier tag components. Such constraints 
can be applied to each set of identifier tag units that is generated. In the end, the 

20 final, e.g., 24-mers, can be examined to ensure that each member of the final set has 
minimal self complementarity (or complementarity with other set members). 
Problem sequences can be identified and rejected at this point, and these sequences 
can be replaced by others generated in the initial 8-mer sets. 

The synthesis can be performed on standard automated DNA synthesizers 

25 sucl) those sold by Applied Biosystems or Phajrr.acia Because a rrlrrivcly large 
number of paraiiei synthesis must be performed (e.g., 100), it is helpfiil to use 
synthesizers that have aiany columns. Alternatively, synthesizers vAih fewer 
cHannels can be emplo>ou ia succession so that 100 different sequences aie 
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generated. These 100 columns are broken down and the resin contained within is 
collected and pooied. It is then split into 100 equal portions either by weighing out 
equal masses or by resuspending in a convenient volume of liquid (e.g., acetonitrile) 
and then pipetting equal volumes. One hundred new columns are then fabricated 
5 using the mixed contents of the previous set, and the synthesis is repeated. The 
pool and split process is completed as many times as necessary to generate the final 
combinatorial set of beads. 

J. Example 10: Hybridization Discrimination Of Sequence 
Identifier Tags 

1 0 The following example illustrates the hybridization discrimination of 

sequence identifier tags, as depicted in FIGURE 9. 

The 24-mers on the beads should bind with high specificity to their 
complements on the cloned cDNA. Other than a perfect match, the most similar 
hybrids that might ensue consist of complexes that have multiple mismatches in one, 

1 5 differir.g cn average roughly 24 'C m their melting point (Tm). Estimating Tm 
values for specific sequences is diflScult and the calculation involves free energy 
difference calculations if it is to be performed rigorously. However, even when 
strict methods are employed the results can vary from experimental values. There 
are several computer programs that estimate Tm*s for defined oligonucleotide 

20 sequences. Alternatively, a simple formula (Tm = 4(number of G/C 

basepairs) + 2(number of A/T basepairs)) gives a reasonably accurate indication of 
the Tm of a specific sequence. If the, e.g., 24-mers described infi-a are generated 
with 50% G/C content, then the predicted Tm of a particular 24-mer is expected to 
be 77/ C under typical hybridization Gonditioiib. This Tm depeuds on several 

25 fnot-M 5~ebpeciaiiy sail conceaiiation-that can be maripulatcd to alter the Tm. 

Since 24-mers that are most similar to one another differ in one of their 8-mer uuiib, 
thio should Cauae a decrease in im of the mismatched Jdent^^^er sequence of, e.g., 

Oyl op 
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K. Example 11: Synthesis Of cDNA Comprising Sequence 
Identifier Tags 

The following example describes the generation of cDNA 
comprising sequence identifier tags. 

5 A typical reaction to generate double-stranded cDNA marked with identifier 

tags involves first strand synthesis fi-om a primer that contains the 24-mers and 
associated sequences. This first strand is converted into a second strand by one of 
several second strand synthesis procedures. The ends of these double- stranded 
cDN A fi-agments are repaired and inserted into an appropriate cloning vector for 

10 introduction in Exoli. See, FIGURE 10. For first strand synthesis, the primers 
contain the degenerate population of, e.g., 24-mers discussed, infra. If the 
synthesis involves oligo(dT) priming, the 3' end of the primer includes a stretch of 
8-16 T residues; if random-priming is desired, the 3' end includes a random 
sequence, e.g., a hexamer of random sequence. In certain cases, the 5' end of both 

1 5 random primer and oligo(dT) primers may include an additional linker sequence 
useful in cioning or m subsequent PGR experiments; e.g., a restriction endonuclease 
recognition sequence. Conditions for first strand synthesis are known in the art. 
For example, poly(A) selected RNA is denatured in 10 mM methylmercuric 
hydroxide at 65*^0 for 5 minutes, followed by addition of 2-mercaptoethanol to 32 

20 mM. Primer is added to a concentration of 30 uM, reverse transcriptase buffer 
{e.g , fi-om BRL), 5 mM DTT, 400 pM dNTP's, 0.8 units/ul RNasin, and 
Superscript II reverse transcriptase at 200 units/mg of RNA. After one hour at 
37° C, the enzyme is heat denatured at eS^'C and the first strand cDNA is purified 
by gel chromatography, e.g., on Sepharose CL-4B columns. Methods for second 

25 strand synthesis arc also kriovv'n in the art. One procedure involves treatment of 
first strand material in 25 tvM Tris acetate pH 7.7, 50 mlvl KOAc, 1 0 mM 
MgCOAc)., 10 mM(NH,)2S0„ S mMDTT, 50 pM dNTP's, 150 pMNAD, iOO 
pg/ml BSA, and RNase Rcoli Jiaase, DNA polymerase I at 1.6, 4.0, aiiu 40 
units/pg input cBNA, iespectively. The reaction proceeds at 14°C overnight, and 
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double-stranded cDNA is purified on Qiaex beads (Qiagen, Chatsworth, California). 
To polish the ends, double-stranded DNA is for 30 minutes treated at 15**C with T4 
DNA polymerase and T7 DNA polymerase at 3 J and 6.7 units/fig input first strand 
cDN.\, respectively. 

5 L, Example 12: Enrichment And Recovery 

The following example depicts enrichment and recovery of nucleic 

acids. 

cDNAs prepared fi*om two different sources are labeled with fluorophores 
(e.g., HEX in one case and FAM in another). The labeling can be accomplished in 

10 many ways known in the art. For example, the fluorophore can be attached at the 5' 
end of a primer used to reverse transcribe mRNA, or alternatively, to amplify from 
cDNA template suitable for PCR. The fluorophore can also be incorporated during 
synthesis by DNA polymerases as described in Schena et ai, supra. cDNAs from 
two samples are mixed together and hybridised with the beads Bound cDNA is 

1 5 monitored by fluorescence signal at or near the two emission maxima as the beads 
pass through the fluorescence activated cell sorting excitation/detection apparatus. 
The labeled cDNA is mixed with cognate beads so that, for example, one million 
beads are placed in hybridization buffer (e.g., 5x SSPE, 0.1% Triton) with target 
cDNA at a final concentration of 10 pg/ml. The reaction is allowed to proceed 

20 (with mixing) for 10 hours at 30''C, at which time the beads are washed three times 
in Ix SSPE at room temperature. The beads are then diluted into 1 ml PBS plus 
0.05% Triton and run through a fluorescence activated cell sorting machine exciting 
the dyes at 4SS am with an argon laser and measuring fluorescence mtensity at two 
■separate vva.vele^3gtMS (530 iiin and 5S5 nm). Initi^^iiy, the fiiJorrsGcnce activateu 

25 cell sorting machine is "tuned" with beads that are labeled exclusively with FAM or 
with HF.X so tli«t ^ ^Cf^Vmo f?»rtnr ^on be applied tc the intcns:!}' measurement 3; the 
scr^lipg factor is simply the ratio of the mean FAM and IIEX signals at the two 
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emission wavelengths. This factor provides a correction for differences in labeling 
efficiency, excitation and emission strengths, etc. The scaling factor can be applied 
to the real bead fluorescence ratio measurements. Most beads should thus have 
scaled ratios near one, while a few should deviate. Those that deviate can be 
5 collected by sorting, and used individually to provide templates for PGR 

amplification using primers derived fi-om the two ends of the cDNA. Amplified 
material can then be reintroduced into cells for another round of enrichment, or can 
be sequenced, either directly or after cloning first in Kcoli. See, FIGURE 11. 

M. Example 13: Post-Passage Library Comparison 

1 0 The following exemplifies post-passage library comparison. 

A cDNA library, represented in FIGURE 12 as double helices, is introduced 
separately into two cell types. The library can be introduced into cells in a variety 
of ways including transfection, electroporation, or viral infection. Methods for gene 
transfer are known in the art. Stable transformants that carry specific l^br^^y 

1 5 sequences can be isolated using selectable markers carried on the expression vectors 
used in the gene transfer experiments. Alternatively, the library sequences can be 
propagated and expressed transiently. After either isolation of stable transformants 
or establishment of transient cultures, the library sequences can be re-isolated from 
each cell population using, e.g., PGR to amplify the resident library sequences. 

20 PGR primers depend on the details of the library but can be chosen typically so that 
standard PGR conditions apply. The sequences from the two independently 
passaged libraries can be labeled and compared by hybridization to beads followed 
by fluorescence activated cell sorting analysis as in Example 10, mfra Beads that 
carry sequences trcm the idtkl llbiaiy that have difierentiaiiy propagated in t^e two 

25 cell populations are visualized by deviations from unity of fluorescence intensity 
ratios of the labels on sennences hJ!n/ested from each cell pcpuhticn. These bcuds 
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of interest can be isolated, their attached library sequences can be eluted and 
subjected to PGR for analysis. 

N. Example 14: Normalization Of cDNA Libraries 

The following example illustrates the normalization of a cDNA 

5 library. 

cDNA libraries are normalized by hybridization to beads using, e.g., the 24- 
mer oligonucleotides. The bound cDNA is hybridized in a second step with labeled 
cDNA from a particular cell type. Small but detectable amounts of 24-mer 
complement oligonucleotides (labeled with a fluorophore distinct from the cDNA 

10 fluorophore) are included in the hybridization to serve as a normaHzing signal. (The 
order of hybridization steps may be varied). The beads are sorted using 
fluorescence activated cell sorting into bins that reflect the ratios of the two signals. 
These bins are amplified independently and remixed in equal amounts with one 
another to form the final normalized pool of cDNAs. See, FIGURE 13. 

1 5 Alternatively, random oligonucleotides of random or pseudo-random 

sequence {e.g., random 15-mers) on beads can be used to normalize a library. In 
this case a labeled cDNA is hybridized to the beads via the 15-mers and sorted 
based solely on its signal alone. 

O. Example 15: Quantitative Comparison Of mRNA Levels 

20 The following example illustrates the quantitative comparison of 

mRNA levels. 

cDNA libraries that contain tb.e 74-mer identifier tags arc hyhridi>eu li> 
solution to labeled cDNA produced from two different sources of mRNA, one 
labeled v/ith, e.g., FAM, one with, e.g., HEX. This mixture is subsequently 
?5 hybridized to beads that coiitaiii 24-nier complements, (ihe order of these tv/c 
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hybridization steps may be inverted.) The beads are then sorted based on the 
FAI^l^HEX fluorescence ratios. The relevant populations of beads are isolated, 
cDNAs containing the tags are eluted and used as templates for PGR. The 
amplified cDNAs are sequenced, with or without cloning, or passed through cells 
5 See, FIGURE 14 

P. Example 16: Kinetic Genetics 

The following example illustrates the use of the present invention for 

kinetic genetics. 

The procedure involves passage of an, e.g., cDNA library through two 

1 0 different cell types, in FIGURES 1 5 A and 1 5B represented by circles or oblong 
trapezoids. The DNA is introduced using transient expression procedures that are 
known in the art such as electroporation, lipofection, viral infection, DEAE dextran, 
or calcium phosphate precipitation. The cells are allowed to undergo several 
rounds of cell division, typically between 5 and 20 divisions. Because most 

i 5 transferred mammalian sequences can replicate in host mammalian cells 

extrachromosomally (or within a chromosomal insertion site), proliferation of the 
cells is expected to result in multiplication of the transferred sequences. However, 
since the transferred sequences typically lack a centromere or other sequence that 
can ensure proper segregation, continued propagation of the cells resuhs in gradual 

20 loss of transferred DNA. However, over relatively short numbers of cell divisions, 
it is likely that sequences that either confer a growth advantage to the host cell, or 
are neutral in their effect on growth, will increase in abundance as the cells divide. 
In contrast, sequences that do not replicate or have deletenoiTS effects on cell 
growth will be prcfcrcntiaiiy lost For example ten cell divisions should result m an 

25 increase of (2)*" (or roughly one thousand) in the mass of a properly replicating and 
segregating sequence. If, however, sequence segregation is random during division, 
half the time one daughter cell does net inherit a sequence (assuming two miriRl 
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copies per parental cell). This may result in decreased amplification to, e.g,, (1.5)^° 
(or roughly sixty). However, these transferred sequences are able to reproduce and 
can gain a selective advantage over any transferred sequence that causes cell death 
or inhibits cell growth. If a particular sequence causes cell death in one cell type 
5 and has a neutral effect in another, a post-passage comparison of the abundance of 
that sequence in the two passaged libraries may reveal a significant difference 
between the libraries. 

A potential problem with using transient expression in mammalian cells is 
the possibility of multiple transferred sequences per cell; i.e., a single cell harbors 
10 more than one transferred sequence and thus the selection may apply to "bystander" 
sequences as well as the sequence of interest. This problem can be circumvented by 
either multiple rounds of passage (passage, re-isolation of the library, and 
reintroduction into cells) or methods such as viral infection which limit the number 
of transferred sequences per cell. 
1 5 In summary, transient expression has the considerable advantage of speed, 

ease, and flexibiUty (since most cells can be transfected transiently), but the 
disadvantage that the enrichment levels may not be as high as with stably expressing 
cells. Imperfect replication/segregation will cause increases in neutral sequences 
that is subgeometric. However, since the "signal" takes the form of relative 
20 abundance differences between sequences present in two independently passaged 
libraries, and since multiple enrichment cycles {see, infra) can be performed, the 
method provides a rapid, general mechanism for establishing the role of specific 
sequences on cell growth. For example, if two different sequences fi-om a genetic 
library, A and B, are propagated in two different cell t^^pes for ten (10) generations 
25 in whidi A is neuttal but B causes growth arrest in one of the ceil types, the 

foUowmg considerations apply: after 10 generations A will have increased, e.g., 60- 
foid in both cell types so that the ratio of A abundance in both post-oassaged 
llbidi ies is one. However, B increases bO-tbld m one cell type but not at all in the 
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other; thus, its ratio is 60. A single round of passaging, therefore results in, e.g., a 
60-fold change in the abundance ratio of B in the two passaged libraries. The 
invention described herein provides the means to detect and isolate sequences that 
behave in this fashion. 
5 To increase the likelihood that DNA sequences may have effects on cell 

growth, genetic libraries are constructed in expression vectors suitable for 
introduction into the host cells and designed to facilitate transcription and 
translation of the DNA insert sequences from the library. For example, in 
mammalian ceils vectors that contain cytomegalovirus enhancer sequences are 
10 useful as are numerous others. In yeast, sequences that contain the GAL4 enhancer 
and/or promoter are useful for this purpose. The genetic library used in these post- 
passage experiments may consist of full-length cDNA clones, cDNA fragments, or 
genomic DNA fragments. The library may also consist of random or semi-random 
insert sequences, preferably fused to or inserted into sequences from another 
15 relativelv stable protein. Such sequences have been termed "perturbagens". See^ 
U.S. Patent Application Serial No. 08/699,266, filed August 19, 1996, incorporated 
hereby by reference in its entirety. 

The library sequences, once introduced into and propagated in a particular 
pair of cell types, may be isolated from each cell type by several methods including 
20 PGR (using primer sites that flank the insert), or by transformation of bulk DNA 
into suitable host cells such as Kcoli, and recovery of clones that contained 
selectable markers present on the expression vector such as ampicillin resistance 
genes. 

The library sequences, once n^Cuv^n-u, can uc umymny..^ tuiu lu^ww^ ^vit**, 
25 e.g., finorophorcs sugm IIEX fiT-d FAM (I JEX for one sample^ I* AM tor the 
other). These labeled post-passage library inserts can be hybridized to beads that 
eont^m roinnlpmpntc; of identifier taps that fire att?»ched to the library inserts during 
the op>inal constniction of the Hbrari'. Fluorescence activated reH <;oTtin£ analysis 
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as described, infra, can then detect beads that have skewed HEX/FAM intensity 
ratios, and hence sequences that are candidates for inducing selective cell growth, 
arrest, or death in one cell type and not the other. 

Q. Example 17: Synthesis of Identifier Tag Sequences On and Off 
5 Beads 

Choice of sequences for identifier tags : As discussed above, several issues 
were considered in choosing identifier tag sequences. First, the identifier sequences 
must permit specific hybridization in relatively complex mixtures so that their 
cognate sequences can be fished out fi-om the mix and attached via Watson-Crick 

1 0 basepairing to the beads for analysis and sorting. Second, but equally important, 
the identifier sequences must encompass sufficient diversity so that large numbers, 
thousands to millions, can be examined in single experiments. Third, the synthesis 
of such sequences must not be prohibitively costly or labor intensive. Balancing all 
the above considerations, we performed a strategy that uses combinatorial synthesis 

1 5 of three units of 8 nucleotides each. 

Identifier tag sequences were svnthesi7 ,^d on and oflF beads: Identifier tag sequences 
were synthesized as described below. If attached to beads, identifier tag sequences 
are preferably attached m a manner that prevents hydrolysis of the bead linkage 
during base deprotection. 

20 Reagents: Pe rSe ptive Biosvstems 

1 .OMT-D-A-denosine (N6-Benzoyl) Cyanoethy! Phosphoramiditc 
7 DMT-D-Cj^idine (N6-Ben2oy!) Cyancethyl Phcsphcramidite 

3. DMT-D-Guanosine (N6-Isobutyrl) Cyanoethyl Phosphoramiditc 

4. DM f - 1 hymidine Cyanoethyl Phosphoramiditc 

25 5, Activator Solution: 9i.0-99.0% acetonitrile, 1 O-S 0% 1-H fetrazole 



wo 98^6098 



PCT/US97^173 



-76- 



10 


GCTGTACA 


RC6 


TTGACTGG 


11 


CACGAGAT 


RC7 


TTACGCAG 


12 


ATCTCGTC 


RC8 


TTTCTCGG 


13 


TAAGCCAC 


RC9 


GGAGACTA 


14 


TTTCTGCC 


RCIO 


TGTACAGC 


15 


GCAACATC 


RCll 


ATCTCGTG 


16 


ACATGGTG 


RCI2 


GACGAGAT 


17 


AATACGCG 


RC13 


GTGGCTTA 


18 


AATTCCGC 


RC14 


GGCAGAAA 


19 


AATCGTCC 


RC15 


GATGTTGC 


20 


AATGGAGG 


RC16 


CACCATGT 


21 


AACTAGGC 


RC17 


CGCGTATT 


22 


AACCTACC 


RC18 


GCGGAATT 


23 


AACGTTGG 


RC19 


GGACGATT 


24 


AAGTACGG 


RC20 


CCTCCATT 


25 


AAGCTTCG 


RC21 


GCCTAGTT 


26 


AAGGTAGC 


RC22 


GGTAGGTT 


27 


ATACCAGC 


RC23 


CCAACGTT 


28 


ATAGCTCG 


RC24 


CCGTACTT 


29 


ATTCCTGG 


RC25 


CGAAGCTT 


30 


ATTGCACC 


RC26 


GCTACCTT 


31 


ATCACCAG 


RC27 


GCTGGTAT 


32 


ATCCAAGG 


RC28 


CGAGCTAT 


33 


ATCGATCC 


PvC29 


CCAGGAAT 


34 


ATGACGAC 


PvC30 


GGTGCAAT 


35 


ATGTCCTG 


RC31 


CTGGTGAT 


36 


ATGCATGC 


RC32 


CCTTGOAT 


37 


ATCKtAACG 


11C33 


GGATCGAT 
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38 


ACAAGCAC 


RC34 


GTCGTCAT 




39 


ACACACCA 


RC35 


CAGGACAT 




40 


ACAGAGGA 


RC36 


GCATGCAT 




41 


ACTAGGCA 


RC37 


CGTTCCAT 


5 


42 


ACTTGCGT 


RC38 


GTGCTTGT 




43 


TGTGCTGA 


RC39 


TGGTGTGT 




44 


TGCCAGTA 


RC40 


TCCTCTCT 




RC41 


TGCCTAGT 








RC42 


ACGCAAGT 






10 


RC43 


TCAGCACA 








RC44 


TACTGGCA 








RC45 


ACTGACCA 








RC46 


GTATCCCA 








RC47 


TCCAGTTG 






15 


RC48 


GGTCTATG 







S ynthesis of 13.824-fold complex ID bead pools 

Synthesis of beads was performed in three rounds, as follows: 

Round 1: 16 Glen Research Twist columns loaded with 15 mg of 
Pharmacia 30 HL resin each were put on a synthesizer and subjected to synthesis of 

20 8-mers 1-16. These 8-mers each had an extra sequence "58T" at the 3' end. The T 
is a "ghost", that is, it is only there because the synthesizer thinks it is always 
synthesizing on a cohiTrin with a base already present and this needi to be included 
in the sequence Tlte "8" c-orrespcnds to bcttic 8 on the n^r.ch.ir.e. whicli eoiiiaiiied a 
1 :60 dilution of a 0. 1 M solution of 18-atom spacer. "5" corresponds to bottle 5, 

25 which contained a 0. 1 M solution of 1 8-atom spacer The protocol >i^ed ^e^e v/as 
"boUleS CAP/0.2 umoie'\ which is the same as a regular 0.2 protocol, with the 
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exception of anything delivered from bottle 8 (see Protocols in Tables 1 and 2, 
below). At the end of this round, there are 16 columns, each with 30 IIL beads 
having 2 spacers and a unique 8-mer from 8-mers 1-16. This synthesis was done 
"trityl-on". 

5 Round IfaV 8 columns with 15 mg of Pharmacia 30 HL resin were 

subjected to synthesis of 8-mers 17-24, exactly as in Round 1. The beads from the 
24 columns, containing 8-mers 1-24, were mixed by flushing beads from columns 
with acetonitrile into a single tube. The tube was mixed and the beads re-aliquoted 
into the 24 columns. The total volume of beads plus acetonitrile was 12 ml. The 

10 beads were mixed thoroughly before each aliquot of 0.5 ml was taken and added to 
a column on a vacuum manifold. 

Round 2: 16 of the columns from the previous step were subjected to 
synthesis of 8-mers 25-40. The 8-mer sequences each had an extra "T" at the 3' 
end, again, a "ghost" for the benefit of the synthesizer. The protocol used was 

1 5 "MOSS 0.2 ^mole'^ the protocol provided by PerSeptive. This synthesis was done 
"trityl-on". 

Round 2faV The remaining 8 columns were subjected to synthesis of 8- 
mers 41-48, exactly as in Round 2. The beads were then mixed again, exactly as 
before, and were re-aliquoted into the 24 columns once again. 
20 Round 3: 16 of the columns from the previous step were subjected to 

synthesis of 8-mers 1-16, Again, a "ghost T" was added at the 3' end. The 
protocol used was "MOSS 0.2 nmole", and this round of synthesis was done "trityl- 
ofF\ 

Round 3(aV The remaining 8 columns were subjected to synthesis of 8- 
25 mers 17-24 (plus "ghost T") exactly as in Round 3 
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Beads were flushed from columns into glass vials with concentrated ammonium 
hydroxide and allowed to sit at room temperature overnight to deprotect. Beads 
were then washed four times with 2x SSPE and resuspended in 2x SSPE. 

Synthesis of 13.824-fold complex complement oligo pools 
5 Synthesis of complements was done in three rounds as follows: 

Round 1 : 16 Glen Research Twist columns loaded with 500 Angstrom CPG 
in the amount required for a 1 nmole synthesis each were put on the synthesizer and 
subjected to synthesis of RCSmers 1-16. The synthesis was done "trityl on" and the 
"MOSS 0.2 ^mole" protocol was used. 

10 Round UaV 8 columns with 1 nmole 500 Angstrom CPG were subjected to 

synthesis of RCSmers 17-24, exactly as in Round 1 . The resin from the 24 
columns, containing 8mers 1-24, was mixed by flushing beads from columns with 
acetonitrile into a single tube. The tube was mixed and the beads re-aliquoted into 
the 24 columns. The total volume of resin plus acetonitrile was 12 ml. The beads 

1 5 were mixed thoroughly before each aliquot of 0,5 ml was taken and added to a 
column on a vacuum manifold. 

Round 2 : 16 of the columns from the previous step were subjected to 
synthesis of RCSmers 25-40, The 8-mer sequences each had an extra "T" at the 3' 
end, again a "ghost" for the benefit of the synthesizer. The protocol used was 

20 "MOSS 0.2 ^mole", the protocol provided by PerSeptive. This synthesis was done 
"trityl-on". 

Round 2fay The remaining 8 columns were subjected to synthesis of 
RCBmers 41-48, exactly as in Pvcuiid 2 

1 he beads were again mixed, exactly ?s before, re-aiiqucting into the 24 ccluiru-s 
25 once again. 
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Round 3: 16 of the columns from the previous step were subjected to 
synthesis of RCSmers 1-16. Again, a "ghost T" was added at the 3' end. The 
protocol used was "MOSS 0.2 ^imole", and this round of synthesis was done "trityl- 
on . 

5 Round 3(a): The remaining 8 columns were subjected to synthesis of 

RCSmers 17-24 (plus "ghost T") exactly as in Round 3. 

The resin from columns 1-3 was mixed to make C Pool 1. 

The resin from columns 4-6 was mixed to make C Pool 2. 

The resin from columns 7-9 was mixed to make C Pool 3. 
10 The resin from columns 10-12 was mixed to make C Pool 4. 

The resin from colunrms 13-15 was mixed to make C Pool 5. 

The resin from columns 16-18 was mixed to make C Pool 6. 

The resin from columns 19-21 was mixed to make C Pool 7. 

The resin from columns 22-24 was mixed to make C Pool 8. 
1 5 The new Pools of resin were then aliquoted into 10 columns. Column 1 

contained resin from pool 1, column 2 contained resin from pool 2, column 3 

contained resin from pool 3, columns 4 and 5 contained resin from pool 4, columns 

6 and 7 contained resin from pool 5, colunm 8 contained resin from pool 6, column 

9 contained resin from pool 7 and colunm 10 contained resin from pool 8. 
20 Colunms 1-4 and 6 were subjected to a synthesis adding only from bottle 6 

(PerSeptive Biosystems* FluoreDite). Sequence was "6T", the T being a 3' ghost. 
Columns 5 and 7-10 were subjected to a synthesis adding only from bottle 7 

(Glen Pvcscarch IIEX-phosphoramidite). Sequence was '1T\ the T being a 3' 

25 Oligos were cleaved from columns using 1 ml of concentrated ammonium 

hydroxide by attacbino two '?ynnae<j^ o"e containing the ammonium hydrcxidc, tc 
eitlier end of the coiumn and pushing gently back and forth about 10 times. This 
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was allowed to sit (wrapped in foil) for 45 minutes, pushed back and forth 10 times, 
and allowed to sit for another 45 minutes. The cleaved oligos were then flushed 
into glass vials with concentrated ammonium hydroxide and allowed to sit at room 
temperature overnight to deprotect. Oligos were then OPC purified using Poly-Pak 
5 II cartridges according to the manufacturer's instructions (Glen Research). Oligos 
were resuspended in nano-pure water. 



wo 98^6098 PCT/US97a3173 



-82- 



TABLE 1 

Protocol Cycle For Capping and Spacer Addition to Resin 



««■*♦♦*«*♦*♦*** *************************************************************** 

* Protocol Cycle Report: Cycle 8 (8) of "bottleB CAP/0.2 umole" Page 1 * 

* Sxpedite(TM) Nucleic Acid Synthesis System (Workstation) * 

* Fri Dec 05 10:00:06 1997 

** ♦♦*********♦*****•♦********♦♦*♦** ************* **************************** 



Created: 
Modified: 
Project : 
Author : 
Source : 
Type: 
Scale : 
Comments : 



Thu Oct 09 15:42:52 1997 
Thu Oct 09 15:42:52 1997 
Expedite System 
PerSepnive Biosy stems 
MOSS 1 umole Protocol Master 
DNA, normal 
1 micromole 

MOSS protocol for the synthesis of 
DNA at the 1 umole scale. 



/* - 

/* Function 

/* 

/* - 

$Deblocking 
144 /* Index Fracc . Coll. 

0 /*Default 
16 Z+Dblk 

141 /*Trityl Mon. On/Off 
16 /*Dblk 
16 /*DDllc 

36 /^Diverted Wsh A 
3 8 /*Div€rted Wsh A 
141 /*Trityl Mon. On/Off 
144 /* Index Fract. Coll. 
SCoupling 

1 /*Wsh 

2 /*Act 
41 /*Gas B 
25 /*8 + Act 

2 /*Act 
1 /*Wsh 
1 /*Wsh 
SCapping 

12 /*W9h A 

13 /*Cap3 
$Deblocking 

0 /*Default 
$Capping 
12 /*Wsh A 
12 /*Wsh A 

12 /*Wsh A 

13 /*Caps 
SDcbicciimg 

12 /*W8h A 
12 /*Wsh A 

12 /*Wsh A 

13 /*Caps 
:>UeDJ.ocicing 



Mode 



Amount Time (sec) 
/Argl /Arg2 



Description 



*/ NA 


1 


0 


"Event out ON" 






*/ WAIT 


0 


1.5 


"Wait" 






*/ PULSE 


20 


0 


"Dblk to column" 






*/ NA 


1 


1 


"START data collection" 




*/ PULSE 


20 


0 


"Dblk to coluT^n" 






«/ PULSE 


30 


30 


"Deblock" 






*/ PULSE 


20 


20 


"Deblock" 






*/ PULSE 


60 


0 


"Flush system with 


Wsh 


A" 


*/ NA 


0 


1 


"STOP data collection" 




*/ NA 


2 


0 


"Event out OFF" 






*/ PULSE 


5 


0 


"Flush system with 


Wsh" 




*/ PULSE 


5 


0 


"Flush system with 


Act" 




*/ PULSE 


1 


5 


"Gas B» 






*/ PULSE 


7 


0 


"Monomer + Act to column" 


*/ PULSE 


3 


0 


"Chase with Act" 






*/ PULSE 


10 


0 


"Chase with Wsh" 






*/ PULSE 


20 


104 


"Couple monomer" 






*/ PULSE 


100 


0 


"Flush system with 


Wsh 


A" 


*/ PULSE 


300 


0 


"Caps to column" 






*/ WAIT 


0 


900 


"Default" 






*/ PULSE 


100 


100 


"Cap" 






*/ PULSE 


300 


0 


"Flush system with 


Wsh 


A" 


*/ PULSE 


100 


0 


"Flush systftm with 


Wsh 


A" 


♦/ PULSE 


100 


0 


Taps to column" 






♦ / t *t\ TtT» 






"liof awilii" 






♦/ PULSE 


100 


100 


-Cap" 






*/ PULSE 


300 


0 


"Flush system with 


Wsh 


A" 


*/ PULSE 


100 


0 


"Fli'-s*^ '=?yster? with 


VJeh 


A" 


♦/ PULSE 


300 


0 


"Caps to column" 
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TABLE 1 (Continued) 



* Protocol Cycle Report: Cycle ti (8) of "bottleS CAP/0.2 ur.olc" Page 2 * 

* Expedite (TMJ Nucleic Acid Synthesis System (workstation) * 

* Fri Dec 05 10:00:06 1997 * 
♦******************************************************************* ****** 



0 /*Default 
SCapping 
12 /*Wsh A 
12 /*W0h A 

12 /*Wsh A 

13 /*Caps 
$ Deblocking 

0 /*Default 

$ Capping 
12 /*Wsh A 
12 /*Wsh A 

$Oxidizing 
15 /*0x 

12 /*Wsh A 
S Capping 

13 /*Caps 

^7 /*Wfth A 



*/ 


WAIT 


0 


900 


"Default" 








PULSE 


100 


100 


"Cap" 






*/ 


PULSE 


300 


0 


"Fluch system with 


Wsh 


A"- 


*/ 


PULSE 


100 


0 


"Flush system with 


Wsh 


A" 


*/ 


PULSE 


300 


0 


"Caps to column" 






*/ 


WAIT 


0 


900 


"Default" 






*/ 


PULSE 


100 


100 


"Cap" 






*/ 


PULSE 


300 


0 


"Flush system with 


Wsh 


A" 




PULSE 


125 


0 


"Ox to column" 






*/ 


PULSE 


100 


0 


"Flush system with 


Wsh 


A" 


*/ 


PULSE 


50 


0 


"Caps to column*' 






*/ 


PULSE 


340.. 


0 


"End of cycle wash" 





Table 1: Synthesis parameters for generation of combinatorial sets of 
identifier sequences on beads -- capping and spacer addition to resin. 
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Protocol Cycle for Oligonucleotide Synthesis 
(Beads or Oligonncleotide Complements) 



************************************************************************** 

* Protocol Cycle Report: Cycle A <dAdenosine) of "bottleS CAP/0.2 umole"Page 

* Expedite (TM) Nucleic Acid Synthesis System (Workstation) * 

* Fri Dec 05 09:59:42 1997 * 

******************************************************* *********** 

Created: Thu Oct 09 15:42:52 1997 

Modified: Thu Oct 09 15:42:52 1997 

Project: Expedite System 

Author: PerSeptive Biosystems 

Source: MOSS 1 ximole Protocol Master 

Type : DNA, normal 

Scale : 1 micromole 

Comments: MOSS protocol for the synthesis of 
DNA at the 1 umole scale. 



/* 
/* 
/* 

/* 

SDeblocking 
144 /* Index Fract. Coll. 
0 /*Default 
16 /*Dblk 
141 /*Trityl Mon. On/Off 
16 /*Dblk 
16 /*Dblk 

TP Diverted "ch J\ 
38 /♦Diverted Wsh A 
141 /*Trityl Mon. On/Off 
14 4 /* Index Fract. Coll. 
$ Coup ling 



Function 



Mode 



Amount Time (sec) 
/Argl /Arg2 



Description 



1 
2 

41 
18 
2 
1 
1 
1 



+ Act 



/*wsh 
/*Act 
/*Gas B 
/*A 
/*Act 
/*Wsh 
/*Wsh 
/•Wah 
S Capping 
13 /*Caps 
12 /*Wsh A 
12 /*Wsh A 
$Oxiaizing 
15 /*0x 

12 /*Wsh A 
$ Capping 

13 /*CapR 
/♦W«='>i ^ 



*/ 


NA 


1 


0 


"Event out ON" 


•/ 


WAIT 


0 


1,5 


"Wait" 


*/ 


PULSE 


20 


0 


"Dblk to column" 


*/ NA 


1 


1 


-START data collection- 


*/ 


PULSE 


20 


0 


"Dblk to column" 


*/ 


PULSE 


30 


30 


"Deblock" 


tfr/ 




2 \j 


2 0 


-Deblock" 


*/ 


PULSE 


60 


0 


"Flush system with Wsh A" 


*/ 


NA 


0 


1 


"STOP data collection" 


*/ 


NA 


2 


0 


"Event out OFF" 


*/ 


PULSE 


S 


0 


".Flush system with Wsh" 


*/ 


PULSE 


5 


0 


"Flush system with Act" 


*/ 


PULSE 


1 


5 


"Gas B" 


*/ 


PULSE 


7 


0 


"Monomer + Act to column" 


*/ 


PULSE 


3 


0 


"Chase with Act" 


*/ 


PULSE 


8 


0 


"Chase with Wsh- 


*/ 


PULSE 


20 


104 


"Couple monomer" 


*/ 


PULSE 


2 


0 


"Flush with Wsh" 


*/ 


PULSE 


8 


0 


"Caps to column" 


*/ 


PULSE 


10 


0 


"Chase with Wsh A" 


*/ 


PULSE 


20 


15 


"Slow pulse to cap" 


*/ 


Pin^SE 


15 


0 


"Ox to column" 


*/ 


PULSE 


5 


0 


"Chase with Wsh A" 


♦ / 


PULSE 




0 


"Capb to column" 


*/ 


I*ULCC 


C 0 


0 


End ot cycle wash" 



Table 2: Synthesis parameters for generation of combinatoriaB sets of 
5 identifier sequences or oHjoniirlPntide f G!!3p5e!!ie!^t5. 
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R. Example 18: Synthesis and Hybridization of 
Target Nucleic Acids 

The identifier sequences can be attached to library sequences in a variety of 
ways, as described herein Other issues which must be addressed in preparation of 

5 the target nucleic acid for hybridization to beads include that the target must be 
labeled with a fluorochrome; the target must be generated in sufficient quantity; and 
the target must be of size that permits hybridization to beads in an optimal manner, 
such that sufficient signal can be detected in complex mixtures. Typically, 
sequences less than 100 base pairs are preferred. 

10 The following describes one approach, which uses in vitro transcription 

methodology, for generating fluorescently-labeled RNA. The RNA is then 
hybridized to beads which have the complementary DNA sequence synthesized on 
them (see Example 17). 

Fx perimental System: 

i 5 The following exemplifies a construction in which an ID tag which was 

generated in the ID tag library is placed downstream of a strong promoter (e.g., the 
bacteriophage T7 promoter). The vector containing the T7 promoter was cut with 
two endonucleases, e.g., PstI and EcoRI. A double-stranded ID tag with 
homologous ends was ligated into the site. The vector containing the T7 promoter 

20 with the downstream ID Tag was then hnearized using another restriction enzyme 
(e.g., Sal I) and the construct used as a template for in vitro transcription. By 
cutting the template downstream from the ID tag (e.g., with Sail), an approximately 
50 ha^e pair (bp) nm off RNA transcript was generated upon in yitro transcription 
(see below) 
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T7 promoter ~^ PstI EcoRI Sail 

CGTACGATTATGCTCACTGATATCCCGACGTCCCCTTAilGACGTACGTTCGA'rCaAGCATGCATC^GC^^CC, . 



5 T7 promoter — ► PstI ID Tag EcoRI Sail 

OTAA.TACGACTCACTATAGGGCTGCAGGCTGTACAGTaU^AAGAAGCC^ • * 

CCTACGATTATGCTGAGTGATATCCCCSlCGTCCa^CATGTCAGTTTTCTTCGGCTTAAGACGTACGTTCGATCGTGCArCAGCT^ . 



7/7 vitro Transcription Protocol: 

1 no fj| total volume reaction: 
10 ImMrATPs 
ImMrGTP 
ImMrUTP 
O.SmMrCTP 

O.SmM Fluorescein-12CTP (NEL434 from MEN Life Sciences) 
15 1 ng of linearized Template (7 kB Plasmid) 

10^1 of T7, RNasin, pyrophosphate mix (Promega Ribo Max #P1300) 
2Cpl of Ticiiiicription Buffer v4G0miVi HEPES-KOH, pH7.5, 120mM MgCl^., 
lOmM spermidine, 200mM DTT) 



The reaction was incubated for 4 hours at 37°C , another 10^1 of enzyme mix 
20 was added and the reaction incubated for an additional 4 hours at 37'*C. The DNA 
template was removed after the trancription reaction by digesting with RQl RNase- 
free DNase at lU/^g of template for 15 minutes at 37°C. The reaction was extracted 
with one volume of phenol:chloroform:isoamyl alcohol (25:24:1) pH4.5 and ethanol 
precipitated using sodium acetate and 70% ethanol. The ethanol precipitate was 
25 resuspendcd in DEPC-trcatcd .double-distilled water (ddKjO). A 260nm/280ntT! 

srectrcphctcmetcr rcadir^R was takeii to approximate the conccntrption of tr.e A 
transcript using standard techniques. The fluorescently-labeled RNA was then ready 
for hvbridization to heads 



wo 98^6098 PCr/US97/23173 



-87- 

H yhridization Protocol - 

Optimal conditions for hybridization are preferred so that good signal-to-noise 
ratios are achieved. This permits the method to be extended to complex mixtures of 
target nucleic acid, a feature that is necessary for most genetic experiments. An 
5 exemplary hybridization experiment is described below. Those of skill in the art can 
determine empirically optimum hybridization conditions for chosen target nucleic acids 
and oligonucleotide identifier tags. 

100,000 beads having the complementary sequence to the RNA transcript (see 
above) were added to 1 nM final concentration of labeled RNA transcript in 100 ^1 of 

1 0 hybridization buffer. The temperature was raised to 60T and the nucleic acids 
hybridized for 16 hours. The hybridized beads were washed 3x with wash buffer at 
60°C and resuspend in 1ml PBS. The hybridized beads were then analyzed on a flow 
cytometer as described herein. Hybridization Buffer: 20mM phosphate Buffer, 
298mM NaCl 2mM EDTA, pH 7.4, 0.5%SDS Wash Buffer: lOmM phosphate Buffer, 

15 149mM NaCl ImM EDTA, pH 7 4, 0 !°/cSDS 

Flow cytometry experiments to optimize hybridization: 

The following experiments examine the effect of the position of the identifier 

sequence tag within an RNA transcript on the efficiency of hybridization to 

complementary capture oligonucleotide sequences attached to beads. The experiments 
20 demonstrate that it is preferable to position a 24 nucleotide sequence ID tag at the 5' 

end or in the middle of a 60 nucleotide labeled RNA transcript rather than at the 3' end 

of the transcript (see FIGURE 18). 

Fluorescent RNA transcripts (approximately 60 b^es long) compnsmg 24 

ni^cieotidc soqucncn iD tns^ ^1 thc^r :S" or V end, or in the niiddie of the tra-iscnpt, 
25 were synthesized using the T7 in vitro transcription system, essentially as described 

oKqyq DNA ciigcnucloctidcj v/crc SjTithcsizcd, iind capture oligoriuclcotidcs vvcrc 
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attached to beads, essentially as described in Example 17. Hybridization reactions 
were performed as described above. 

FIGURE 18 depicts flow cytometric analyses using fluorescently 
labeled RNA transcripts (approximately 60 bases in length) comprising 24 base 

5 oligonucleotide identifier tags at their 5' end (A; "5* bead"); 3' end (B; "3* bead"); or 
approximately in the middle of the transcript (C; "Mid bead"); hybridized to beads with 
attached complementary capture oligonucleotides (24-mers). Beads with attached 
DNA capture oligonucleotides which were not complementary to the oligonucleotide 
tags (i.e., non-specific sequences) were used as a control (D: "NS bead"). Panel A (5' 

10 ID tags) shows that each of the two test RNA samples (5 ^iM or 1 ^M) hybridized 
efficiently to the beads compared to the positive controls (5' c' and 60mer DNA). 
Panel B (3' ID tags), in contrast, shows that each of the two test RNA samples (5 pM 
or 1 \iM) hybridized much less efficiently to the beads compared to the positive 
controls (5' c' and 60mer DNA). Panel C (Middle ID tags) shows results similar to 

1 5 those of Panel A, suggesting that oligonucleotide ID tags also function v/e!! v/hen 
placed in the middle of these RNA transcripts (e.g., when they are less than 36 bases 
from the 5' end of a 60 base transcript). Panel D (NS Bead) shows that no specific 
binding occurs to beads when the attached oligonucleotides are non-complementary 
(negative control). 

20 S. Example 19: Selection of Target Nudeic Acids Using 13,824 

Complementary ID Tags as Capture Oligonucleotides 

To demonstrate that the methods of this invention may be used to select 
specific nucleic acid sequences from a complex mixture of sequences, a set of 13,824 
uiiTereni: iderilifier sequence tagged heads wcro crr-stniGtccl trom iriir-irriilly cross- 
25 hybridizing 8-mer sequence units. The C++ source code depicted in FIGURE 16 may 
be used to select S rr.er cequcnccc that cciriprisc d set with rnirurnol Cnja;>-H>'oiiuiz.alioit 
between the constituent members. These o-iuei segLifciice units wci e used to generate 



• 
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unique 24-mer sequence ID tags according to the "pool and split" synthetic strategy as 
described herein (see, e.g., Section IV.C and FIGURE 7). 

The following experiment demonstrates that these unique 24-mer sequence ID 
tags can efficiently select nucleic acid sequences from a complex mixture of target 
5 nucleic acids and beads. A subset of the sequence ID tags from the pool produced 
above (containing 1,728 different sequences of the 13, 824 total sequences; 12.5%) 
was fluorescently labeled and used as a target nucleic acid pool for hybridization to 
beads with attached capture oligonucleotides representing the 13,824 ID tag library. 
Hybridized beads were analyzed by flow cytometry, as described below. 

10 Hybridization conditions for the 13.824 ID Tag Library : 

Hybridization reactions were performed in 100|il hybridization buffer 
containing 100,000 beads and 8^M final concentration of the ID tag pool containing 
1 ,728 different sequences. The temperature was raised to 60**C and the reaction 
mixture was hybridized for 16 hcizrs. Hybridized beads v;cro v.ush 2x vrlth WuSh 

1 5 buffer at 60°C and resuspended in 1ml PBS, Hybridized beads were then analyzed by 
flow cytometry (see, e.g.. Example 2). Hybridization Buffer: 20mM phosphate 
Buffer, 298mM NaCl 2mM EDTA, pH 7.4, 0.5%SDSWash Buff'er: lOmM phosphate 
Buffer, 149mMNaCl ImMEDTA, pH 7.4, 0.1%SDS. 

FIGURE 17 depicts flow cytometric histograms (number of events, i.e., beads, 

20 plotted against the fluorescent intensity) of individual beads from the fluorescently 

labeled target nucleic acid population hybridized to complementary identifier sequences 
on beads. Panel (A) shows the auto fluorescence of the 13,824 diff'erent identifier 
sequence-tagged beads (FLl = 525 +/- 20nm light; FL2 = 575 1 5nm Vipht) Panel 
(B) shovvs that ripproximateiy 7.9% of the 13,824 different identifier scqucncc-taggcd 

25 beads specifically hybridized to HEX-labeled complementary identifier sequence tags 
(ID Tags) in the target nucleic aciu puul. The 13,824 uuuiebceiuly labeled 
complerueiitaiy ID id^i> weie niaiataineu in S mutually exclusive pools each containing 
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1 ,728 different ID tags. In a similar experiment, 10.4% of the 13,824 different 
identifier sequence-tagged beads specifically hybridized to FAM-labeled 
complementary identifier sequence tags (ID Tags) in a target nucleic acid pool 
representing 12.5% of the 13,824 total sequence ID tags. 

5 The target nucleic acid pool represented 12.5% of the 13,824 total sequence 

ID tags and approximately 7.9% (HEX-labeled) and 10.4% (FAM-labeled) of the total 
sequences were recovered by hybridization to the beads in the experiments depicted in 
panels A and B. This shows that, using the methods and compositions of this 
invention, one can detect and recover a specific fraction of sequences fi-om a complex 

10 mixture as specifically hybridized material on beads and can separate the specific 
fraction from unhybridized nucleic acid sequences. 

All references cited within the body of the instant specification are hereby 
incorporated by reference in their entirety. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION 

(i) APPLICANT: Ventana Genetics, Inc. 

Kamb, Alexander 
5 Feldhaus, Michael J. 

(ii) TITLE OF THE INVENTION: METHODS FOR MEASURING 
RELATIVE AMOUNTS OF NUCLEIC ACIDS IN A COMPLEX MIXTURE 
AND RETRIEVAL OF SPECIFIC SEQUENCES THEREFORM 

(iii) NUMBER OF SEQUENCES: 3 

1 0 (iv) CORRESPONDENCE ADDRESS : 

(A) ADDRESSEE: FISH & NEAVE 

(B) STREET: 1251 Avenue of the Americas 

(C) CITY: New York 

(D) STATE: New York 
15 (E) COUNTRY: USA 

(F) ZIP: 10020 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 
20 (C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 12-DEC-1997 
25 (C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/764, 1 9 1 

(B) FILING DATE: 1 3 -DEC- 1 996 

(viii) ATTORNEY/AGENT INFORMATION: 
30 (A) NAME: James F. Haley, Jr. 

(B) REGISTRATION NUMBER: 27,794 



(C) PJEFERENCE'DOCKET NU?vlB 
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(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE; 212-596-9000 

(B) TELEFAX: 212-596-9090 

(C) TELEX: 

5 (2) INFORMATION FOR SEQ ID NO: 1 : 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS; unknown 
1 0 (D) TOPOLOGY; unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 : 

GCTGCATAAA CCGACTACAC 20 

(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 
! 5 (A) LENGTH; 20 bass pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
20 GCATTATCCG AACCATCCGC 20 
(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPF: nucleic acid 

25 (C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unlmown 

(xi) SEQUENCE DESCRIPTION: SEQ IDN0:3: 

CCG AGTGTGA TCA i C l GG l C 20 



wo 98^6098 



PCT/US97/23173 



-93- 



WHATIS CLAIMED IS: 

1 . A method for comparing relative amounts of nucleic acids in at least 
two samples, comprising the steps of 

(a) generating a pool of a first and a second sample, wherein said 
5 first sample comprises nucleic acid molecules of a first source, and said first source 

nucleic acid molecules are linked to a first label, and wherein said second sample 
comprises nucleic acid molecules of a second source, and said second source nucleic 
acid molecules are linked to a second label; 

(b) contacting said pool of first and second sample with a plurality 

1 0 of solid supports each having attached thereto multiple oligonucleotides or nucleic acid 
fragments of a unique sequence under conditions that promote the formation of 
perfectly matched duplexes between said oligonucleotides and nucleic acid molecule 
complements within said pool of first and second sample; 

(c) sorting the solid supports according to the relative amount of 
1 5 said first label and said second label: and 

(d) determining the identity of nucleic acid molecules having said 
relative amount of said first and said second label of interest; 

wherein the unique oligonucleotides or nucleic acid fragments attached to the solid 
support comprise a stretch of from about 10 to about 40 nucleotides of random 
20 sequence, or a combination of from about 2 to about 6 sequence units in tandem 
configuration, each unit consisting of fi-om about 7 to about 15 nucleotides. 

2. The method of Claim 1, wherein the identity of the nucleic acid 
molecules in step (d) is determined by diiect DNA sequence analysis of the nucleic 
acids hybridized to the bc^d 
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3 . The method of Claim 1 , wherein the identity of the nucleic acid 
molecules in step (d) is identified indirectly by determination of the sequence of the 
oligonucleotide or fragment attached to the bead. 

4. The method of Claim 1, wherein said first and said second label are 
5 distinguishable fluorescent labels. 

5. The method of Claim 4, wherein said fluorescent labels are individually 
selected from the group consisting of 6-FAM, HEX, TET, TAMRA, ROX, JOE, 5- 
FAM, phycoerythrin and RllO. 

6. The method of Claim 1, wherein the oligonucleotides attached to the 
1 0 solid support have a length of from about 10 to about 50 nucleotides. 

7. The method of Claim 1, wherein the nucleic acid fragments attached to 
the solid support have a length of from about 50 to about 5,000 nucleotides 

8. The method of Claim 6, wherein said soHd supports have attached 
thereto oligonucleotides complementary to nucleic acid molecules representing 

1 5 particular transcripts of interest. 

9. The method of Claim 6, wherein said solid supports have attached 
thereto oligonucleotides having a length of from about 12 to about 30 nucleotides and 
compri sing a stY'etch of from about 10 to about 20 uucleutides of random sequence 

10. The method of Claim 6, wherein said solid supports have attached 
20 thereto oligonucleotides comprising a stretch of from ?Hoi!t 5 to about 25 adenosine 
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residues at the 3' end, and a stretch of from about 8 to about 16 nucleotides of random 
sequence at the 5* end. 

1 1 The method of Claim 1, wherein the nucleic acid molecules in the pool 
are selected from the group consisting of mRNA and cDNA. 

5 12. The method of Claim 11, wherein nucleic acid molecules in the pool 

are cDNA molecules. 

13. The method of Claim 1 2, wherein the cDNA molecules have attached 
thereto unique oligonucleotide identifier tags, each of said tags comprising a 
combination of fi-om about 2 to about 6 sequence units in tandem configuration, each 

1 0 unit consisting of fi-om about 7 to about 1 5 nucleotides. 

14. The method of Claim 13, wherein the capture oligonucleotides attached 
to said beads comprise complements of said identifier tags. 

1 5 . The method of any one of Claims 11-13, wherein the nucleic acid 
molecules of said first and said second sample are derived fi-om cancerous and non- 

1 5 cancerous tissue, respectively. 

16. The method of any one of Claims 11-13, wherein the nucleic acid 
molecules of said first and said second sample are derived from plant cells, insect cells, 
fcngal cells, bacterial cells, viras infected and umnfected cells, senescent and non- 
senescent ceils, parental arrested cells and revertant frrowth jjioficieru cidWt, or 

20 transgenic and normal cells. 
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17. The method of any one of Claims 11-13, wherein the nucleic acid 
molecules of said first and said second sample are derived from cells before and after 
treatment with an agent, respectively. 

18. The method of Claim 1 7, wherein the agent is selected from the group 
5 consisting of a naturally occurring growth factor, an immunologic factor, a therapeutic 

compound, a therapeutic lead compound, and a growth-arresting substance. 

19. The method of any one of Claims I, 12 or 13, wherein the nucleic acid 
molecules of said first and said second sample are derived from a genetic library. 



20. The method of Claim 19, further comprising the step of introducing said 
10 genetic library into two separate cell populations, wherein said genetic library 

comprises a plurality of inserts comprising one or more regulatory sequences which 
regulate gene expression in a living cell 

21 . The method of Claim 20, further comprising the steps of: 

(a) propagating the separate cell populations harboring said genetic 
1 5 library for several cell generations, 

(b) re-isolating said genetic library separately from the two cell 
populations, and 

(c) subjecting the individual library inserts of the two cell 
populations to quantitative comparison. 

'^Ct Ti»^ *^it;ihocl of Cl^iiti 21, wherein tlitJ cHDturt^ UiifA-"'-i*-l^-"**^*^ ^.ttHchcu 



to said beads comprise complements of said identifier tags. 



wo 9S/2d098 



PCT/US97/23173 



-97- 

23 . A method of normalizing a genetic library, comprising the steps of: 

(a) attaching unique oligonucleotide identifier tags to cDNA inserts 
derived from a genetic library; 

(b) hybridizing the cDNA inserts of step (a) with cDNA derived 
5 from a source of interest under conditions that promote the formation of perfectly 

matched duplexes, wherein the cDNA is labeled with a first label; 

(c) contacting the mixture of step (b) with solid supports having 
attached thereto the complements of the oligonucleotide identifier tags under 
conditions that promote the formation of perfectly matched duplexes between the 

1 0 oligonucleotide identifier tags and their respective complements in the presence of fi-ee 
oligonucleotide identifier tags labeled with a second label and corresponding in 
sequence to the oligonucleotide identifier tags of step (a); 

(d) sorting solid supports according to the relative amount of said 
first label and said second label, wherein said relative amount of said first and second 

15 labtl CuiTCldtcS Vvitii tliC auuiiuoiiCt; Old CDNA, \ji HiRIS-ri. bC^^uciiO*^, aiiu 

(e) amplifying cDNA inserts present at lower abundance in order to 
match the abundance of all cDNA sequences such that they are represented at 
substantially similar levels in the library. 

24. The method of Claim 23, wherein said first and said second label are 
20 distinguishable fluorescent labels. 

25. The method of Claim 24, wherein said fluorescent labels are individually 

<^clected from the Rfoup consistino of 6-FAM, HF.X. TKT. TAMRA. ROX. JOF. 
t*/\JVi, pnycocrytnnri anG avI iu. 

26. The method of Ciaim 23, wherein the oiigonucieoiide identifier tag has 
25 a ien^tJi ot trom about 16 to about i2 nucleotides. 
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27 The method of Claim 26, wherein the oligonucleotide identifier tag is a 
24-mer. 

28. A method for producing nucleic acid molecules having unique sequence 
identifier tags attached thereto comprising the steps of: 
5 (a) generating sequence identifier tags comprising a combination of 

between 2 and 6 sequence units in tandem, each unit consisting of from about 7 to 
about 1 5 nucleotides; and 

(b) attaching said sequence identifier tags to a population of nucleic acid 

molecules. 

10 29. The method of Claim 28, further comprising the steps of random 

priming or oligo(dT) priming with primers that comprise sequence identifier tags, or 
ligation of sequence identifier tag adapters onto said nucleic acid population, followed 
by cloning in E. tali or another host. 

30. A normalized genetic library produced according to the method of 
15 claim 23. 

31. A nucleic acid comprising an oligonucleotide identifier tag, said tag 
comprising a combination of from about 2 to about 6 sequence units in tandem 
configuration, each unit consisting of from about 7 to about 15 nucleotides. 

32. A solid support having attached thereto m:?Uiplc copies of an 

20 oiigouULieolide of uiuque sequence, each oligonucleotide comprising a stretch of about 
5 to about 25 adenosine residues at the 3' end, and a stretch of about 8 to about 16 
nucieotiaes ot random sequence at the 5' end. 
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33. A solid suppon having attached thereto muhiple copies of an 
oligonucleotide of unique sequence, each oligonucleotide comprising a complement to 
an oligonucleotide identifier tag, said tag comprising a stretch of from about 10 to 
about 40 nucleotides of random sequence, or a combination of from about 2 to about 6 

5 sequence units in tandem configuration, each unit consisting of from about 7 to about 
1 5 nucleotides. 

34. The method of any one of Claims 11-13, wherein the nucleic acid 
molecules represent the relative amounts of nucleic acids normally present within a cell 
or tissue. 
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^include <stdio.h> 

#defmeMATCH_NEEDED.INJND 3 

^define LEN__MER 8 // recompile when changed. *things to pay attention to. 
int SecondStnict(const char*); 

int CrossHyb(const char *strl, const char ♦str2, int overlap Jength); 
int SimpleMatch(const char ♦strl, const char ♦str2); 
char FirstN(const char *strl, const char *str2, int N); 

int GC_ADDITI0N=1; 
int NUM_GC = 4; 

int SC0RE_NEEDED_IN_2ND = 7; // 1+2+4 
char eql, eq2; 

FILE *fp = stdout; 
FILE *fplog = stderr; 

main(int argc, char ♦*argv) 
{ 



char convert[4]; // 0123 to atcg convertion. 
char ^compatible; 

int max jrb, cnt J5rb; 
char ** probe; 

int max_snd, cni_snd; 
char**sndstr; 
int ♦snd^matchcnt; 

int SIMPLE CUTOFF = 5; // reject if this many bps match to each other, 
// no matter where they are located. 




int CROSSHYB.CUTOFF - 9; // 1+2+4 + 2 
intCROSSHYB 0VEPvLAP = 5; 



!ntF!RST>^ = 4; 



eql =eq2=="; 
if(argc== 1) 
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fprintf(stderr, 

''Usage; %s output_fi!e[stdout]\n", 

argv[0]); 
fprintf(stderr, 

"\t\t-gc number_of_GCs_mj)robe[%d]\B", 

NUM_GC); 
^rintf(stderr, 

"\t\t-2 secondary_structure_reject(including this value)[%d]\n", 

SC0RE_NEEDED_rN_2ND); 
jfprintf(stden, 

"\t\t-ch crosshyb_reject(including this value)[%d]\n", 

CROSSHYB_CUTOFF); 
fprintffstderr, 

"\t\t-sm simple_match_reject(including this value)[%d]\n", 

SIMPLE_CUTOFF); 
fprintf(stderT, 

"\t\t-ol crosshyb_overlapJength[%d]\n". 

CROSSHYB^OVERLAP); 
fprintf(stderT, 

"\t\t-eq add'l_equiv_bp_in_compat_checking[%c%c] (e.g., -eq gt)\n", 
eql,eq2); 

fprintf(stderT, "\t\t-fn first_N^length[%d]\n", FIRST_N); 

rpnntf(stderr, "\t\i-gc_add GC_add*ij}enahy[%d]\n", GC_ADDIT10N); 



while(ii < argc) 
{ 

if(strcmp(argv[ii], "-gc") = 0) 
sscanf(argv[ii+l], "%d", &NUM_GC); 

else if(strcmp(arBv[ii], "-2") = 0) 
sscanf(argviii+l j, &SC0R:6 N"EEDEDJNJND); 

else if(£trcrr.p(2rgv[ii], "-ch") = 0) 
sscanf(argv[ii+l l, "%d", &CROSSHYB_CUTOFF); 

else iffstrcmnfarpvnil, "-ol") = 0) 
sscaiif(argv[ii-i-r]r"iod^&CROSSrnD_OVERLAP): 

else if(strcmp(argv[ii], *'-eq") = 0) 



exit(O); 



// parse input parameters, 
ii=l; 
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ifl[strlen(argv[ii+l]) = 2 |i 

(strlen(argv[ti+l]) = 3 && argv[ii+l][23 = 

{ 

eql = argv[ii+l][0]; 
eq2 = argv[ii+13[l]; 

} 

else 

^ fi3rintf(stderr, "\nERROR: Invalid string after -eq flag.\n\n"); 
exitd); 

} 

} 

else if(strcmp(argv[ii], "-o") = 0) 
{ 

if((fp = fopen(argv[ii+]], "w")) = NULL) 

fprintf(stden, "Can't open file %s to write.Xn", argv[ii+l]); 
exit(l); 

} 

char logname[128]; 

sprintf(lognaine, "%s.iog", argv[ii+l]); 

if((fplog = fopen(logname, "w")) = NULL) 

fprintf(stderr, "failed creating log. stderr used.\n"); 
fplog = stderr; 

} 

} 

else if(strcmp(argv[ii], "-fii") = 0) 

sscanf(argv[ii+l], "%d\ &FIRST_N); 

} 

else if(strcmp(argv[ii]. "-sm") = 0) 

' sscanf(2rgv[:i-!-!], "%d", &SLMJ>LE_CUTOFF); 
} 

else if(strcjnp(a!sv[i!]; "-gc_add") = 0) 
{ 

sscanf(argv[ii+Ij, ^yid% &GC_ADDIllON); 

} 

FIG. 16C 
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else 



fprintf(stderT, "Unknow flag %s\n", argvpi]); 
€xit(l); 

} 

ii += 2; 

} 

maxjrb = 30000; 
probe = new char* [max_prb]; 
for(ii = 0; ii < max_prb; ii++) 
probe[ii] = new char [LEN_MER+I]; 
cnt_prb = 0; 

max__snd = 5000; 
sndstr = new char* [max_snd]; 
for(ii = 0; ii < max_snd; ii-*-f) 
sndstr[ii] = new char [LEN_MER+1]; 
snd^malchcnt = new int [max_snd]; 
cnl_snd = 0; 

// build an array of probes. Esich prcbe is of Isiigth LEN_MER, 
// of which 'NUM^GC are Gs or Cs. 

converl[0] = *a*; 
convert[l] = r; 
convert[2] = V; 
convert[3] = 'g*; 

total jrobes = 0; 

passage = 0; // number of probes pass GC test. 
for(MM[0] = 0; MM[0] < 4; MM[0]++) 
for(MM[l] = 0; MM[1] <4; MM[1]++) 
for(MM[2] = 0; MM[2] < 4; MM[2]++) 

for(MM[3] = 0; MM[33 < 4; MM[3]-H-) 

fQr(MM[4i 0; MM[4] < 4; MM[4]-m-) //^things to pay attention to. 
for(MM[5] = y; MM[5] < 4, MMlSj-t t^) //^things to pay attention to. 
for(MM[6] = 0; MM[6] < 4; MM[6]-H-) //*things to pay attention to. 
for(MM[7] = 0; MM[7] < 4; MM[7]-H-) //♦things to pay attention to. 



lotai probes++; 
gcsum = 0; 



// build ia probe. 
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forGj = 0; jj < LEN_MER; jj++) 
{ 

strQj] = convert[MMQj]]; 
if(str[ij] = 'c'|istr[ij] = 'g') 
gcsum-H-; 

} 

str[LEN_MER] = ^0•; 

//.check its GC contents and secondary structure. 

if(gcsum = NUM GC) 

{ 

pass_gc++; 

fprintf(fi3log, "pass GCtest: %s\n", str); 

if(!SecondStruct(str)) 
{ 

strcpy(probe[cnt_prb], str); 
if(++cnt_prb == raax_prb) 
{ 

// should relocate memor>'. 

// To simplefy the program, let's just give an error msg. 

fbrlntfrstden-, "ERROR' Prebs 2rr=y is tec sn:::!!. =r.t_prb is %^^r", c:ii_prb); 
exit(l); 

} 

} 

else 

{ 

// record the rejected string 
strcpy(sndstr[cnt_snd], str); 
if(-H-cnt snd = max_snd) 
{ 

fprintf(siderr, "ERROR: Secondary Structure array is too small. cnt_snd = %d\n", 

cnt_snd); 

exit(l); 

\ 

/ 

) 

fprintf(fp, "\ri%d mvi ^jUuc belec^iiuum", LEN_MER); 
fprintfifp, "Number of GCs m the prob?!s- %d\n'\ NUM_QC); 
fprintf(fp, "Score to reject as secondary structure: %d\n*\ 

SC0RE_.NEEDED.IN_2ND); 
fyrintf][fp. "Score to reject as incompatible: %d\n", CROSSHYB^CUTOFF); 

FIG. 16E 
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fprintf(fp, "Compatible test overlap: %d\n". CROSSHYB_OVERLAP); 
fpnmf(f^, "Additional ^uivalent base-pair in compatibility checking: %c%c\n", 
eql,eq2); 

fprimf(fp. "Simple match cutoff value(including): %d\n*\ SIMPLE„CUTOFF); 
fprintf(fy, "First N value(including): %d\n", FIRST^N); 
fprintf(i , "Additional penalty for G or C: %d^", GC_ADDITION); 
fprintf(fp. "\n\n"); 

fprintf(fp, "Total possible %d mers: %d\n", LEN_MER. total_probes); 

fprintf(fp, "Number passed GCjest : %d\n", pass^c); 

fprintf(fp, "Number passed secondary structure test : %d\n*\ cntjDrb); 
// for(ii = 0; ii < cnt^snd; ii-H-) 
// fprintf(fp, "%s\n", sndstr[ii]); 



// From the set (call it setl) of probes which passed GC and 2nd strucmre 
// tests, choose a probe into the final set(set2). Then compare this 
// probe against all the probes left in set 1 and throught out the ones 
// that may crosshyb to this probe. From what's left in sell , choose 
// another probe and compaiy it to the rest of setl... 

compatible = new char [cnt j»rb]; 
for(ii = 0; II < cnt_prb; ii-H-) 
{ 

compatible[ii] = T; 

} 

// Compatibility check #1 : Use weighted scores to penalize neighboring matches. 
// first_match_score = 1 ; 

// if prev pair is a match, current_match_score = prev_match_score*2. 
ii = 0; 

failed^ch = 0; 
while(ii < cntjrb) 



iffeompatibieOjl = T && 
(s=Cro£sHyb!ircb£[ii],prcbc[jj],CROSSKTO_OVERLAP)) >= CROSSHYB„CUTOFF) 

{ 

compatiblefijl = 'F': 
failed_ch++; 

fp^intf(fplog, "Rejected(%d) %s in slide test for %s\n", 
s, probe[ij j. probe[ii]); 



for(jj = ii+1 ; jj < cntjrb; jj-H-) 
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while(ii < cnt_pib && coTnpatible[ii] = 'F) 
ii-H-; 

fprintf][fp, "Number of probes passed compatibility test: %d\n*\ 
cnt jrb - failed_ch); 

// Compatibility check #2: Use unweighted score: count unconsecutive matches 

// find the first 'passed' probe. 
ii = 0; 

while(ii < cnt_prb && compatible[ii] = 'F') 

ii-H-; 
failed_sm = 0; 
while(ii < cntjrb) 
{ 

forQj = ii+1 ; jj < cntjrb; jj-H-) 
{ 

if(compatible[jj] = T && 
(s=SimpleMatch(probe[ii],probeDj])) >= SIMPLE^CUTOFF) 

{ 

compatibleHj] = 'P; 

fprintf(fplog, "Rejected(%d) %s in simple^match test for %s\n", 

s. probeQj], probe[ii]); 
failed_sm-H-; 



ii-H-; 

while(ii < cnt_prb && compatible[ii] == T') 
ii-H-; 

} 

fprintf(fip, "Number of probes passed simple match test: %d\n", 
cnt_prb - failed_ch - failed_sm); 



// Coiiipatibiiity check ^3: if ths first N bases match ANY'^HERE in another probe. 
// find the first 'passed' probe. 



ii-O; 

while(ii < cm_p?b compalibiefiij = T) 
ii-H-; 

failed_fii = 0; 
while{ii < cnt jrb) 
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{ 

forOj ^ jj < cntjrb; jj-H-) 
{ 

if{compatible[ij] = T && 
FirstN(probe[ii3, probeDj], FIRST„N) = T) 

{ 

compatibleQj] = 'F; 
failed^fh-H-; 

fprintf(fpIog. "Rejected %s in FIRSTN test for %s\n". 
probeQj], probe[ii]); 

} 

} 

ii++; 

while(ii < cnt_prb && compatible[ii] = T') 

} 

fprintf(fp, "Number of probes passed FIRSTN compatibility test: %d\n", 
cntjprb - failed_ch - failed_sm - failed_fin); 

// output. 

ii = 0: 

fprintf(fp, "\nSelected probes are: \n"); 

for(ii = 0; ii < cnt_prb; ii-H-) 

{ 

if(compatible[ii] = T) 
{ 

fprintf(fp, "%s \n\ probe[ii]); 



// Check if *str' coimtains a secondary structure. That is, if there is a 
// consGGUtivc 3 bases that matebes when 'stf is folded. 
// returr, i if found secondary stnicroi e, 0 otherwise. 

int SecondStract(const char *str) 



int sum, iieore[321; 
char prev_match; 
char *compl; 



FIG. 16H 
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char complement [25 6]; 
complement['a'] = Y; 
complement['t'3 - 'a'; 
complement['c'] - 'g'; 
complementpg'] = 'c*; 

1! = strlen(str); 

compl = new char [11+1]; 

for(ii = 0; ii < 11; ii++) 

{ 

compl [ii] = compIement[str[ii]]; 

} 

for(ii = MATCH_NEEDED_IN_2ND; ii < 11 - MATCH_NEEDED_IN_2ND; ii++) 
{ 

prev_match ~ T'; 
sum = 0; 

forQj = 0;jj <ii;jj-H-) 
{ 

score[jj] = 0; 
kk = ii*2-jj; 
if(kk<ll) 

{ 

if(str[jjj = compi[Kk]) 
{ 

if(prev_match = T) 
{ 

score[jj] = score[]j-l] * 2; 

} 

else 

{ 

scoreOj] = 1 ; 
prev match = T; 

} 

} 

else 

{ 

prcv_match = T*; 

) 

} 

sum += scoreyjj; 

} 

// fpnntfvstden:, "2* sum = %d\n=', sum); 
ifi[sum >= SC0RE_NEEDED_IN_2ND) 

FLG. 161 
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{ 

delete [] compl; 

return 1 ; // Found a 2nd structure. 

} 

} 

for(ii = MATCH_NEEDED_IN_2ND - 1; ii < II - MATCH_NEEDED_IN_2ND; ii++) 
{ 

prev_match = 'F'; 
sum = 0; 

forOj = 0;jj<=ii;jj++) 
{ 

scoreUj] = 0; 
kk = ii*2+l -jj; 
if(kk < 11) 
{ 

if(str[jj] = compl [kk]) 
{ 

if(prev_match == T) 

{ 

score[ij] = score[jj-l]*2; 

} 

else 
{ 

score[jj]= 1; 
prev_match = 'T'; 

} 

} 

else 
{ 

prev_match = 'F'; 

} 

} 

sum += score[jj]; 

} 

// fpr;p.tf{siderf, "2' simi - %d\>n". suJti); 
insiim >= SC0RE_NEEDED_IN_2ND) 
I 

delete Q compl; 

1 . II m yji - n„j -1 

} 



delete [] compl; 



FIG. 16J 
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return 0; // No 2nd stmcture. 

} 

// check if strl and str2 can hybridizy together. 
// return the max of match scores. 
// Assume strlen(strl) == strlen(str2). 

int CrossHyb(const char *strl, const char ♦str2, int overlap) 
{ 

int ii, jj, len, sum, score, prev_score, max_sum, numGC; 
char prev_match; 

len = strlen(strl); 
maxsum = 0; 

fprintf(fplog, "Sliding test between %s and %s\n", strl, str2); 

for(ii = overlap-len; ii <= len-overlap; ii-i-+) 

{ 

numGC = 0; 
sum = 0; 

score = prev_score = 0; 
prev_match = 'F'; 
fprimf(fplog. "Compare "); 
for(jj = ii; jj < len && jj - ii < len; jj-H-) 
{ 

if(jj>=0&&jj-ii >= 0) 
{ 

fprimf(fplog. "(%c,%c) ", strl[jj], str2[jj-ii]); 
if((strl[jj]==str2[jj-ii]) || 
(strl[jj] = eql &«& str20j-ii] = eq2) || 
(strl[jj] = eq2 && str2[jj-ii] = eql)) 

{ 

if(((strlDjl|32) = 'g' && (str2[ij-ii]|32) == 'g') || 
((stTl[ijl!32) = 'C && (str2[ij-ii1!32) = 'c')) 

if(prev_match = T) 

{ 

} 

else 
{ 

score = 1 ; 

FIG. 16K 
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prev match = T; 

} 

} 

else 
{ 

score = 0; 
prev_match = T*; 

} 

sum += score; 
prev_score = score; 

} 

} 

fprintf(fplog, "Score=%d\n", sum + numGC*GC^ADDITION); 

if(sum + numGC*GC_ADDITION > max_sum) 
max_sum = sum + numGC*GC_ ADDITION; 

} 

fprintf(fplog, "Max score is %d\n", max^sum); 
return max_sum; 

} 

// Compare 2 strings base to base. 0 to 0. 1 to 1..., no sliding. 

// return number of matches. 

// Assume sirlen(strl) = strlen(str2). 

int SimpleMatch(const char ♦strl, const char *str2) 
{ 

int ii, sum; 
sum = 0; 

for(ii = 0; ii < strlen(strl); ii-H-) 
{ 

if((strl[ii] == str2[ii]) ii 

(strl [in = cql && str2[i5] = eq2) !! 
(str5 — sq7 ^^2[;i] = eql)) 

{ 

sum^-^-; 

\ 
t 

return simi; 

FIG. 16L 
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// Check if the first N bases of the two probes are identical, 
char FirstN(const char *strl, const char *str2, int N) 

{ 

int ii; 

char match = T; 

if(N>strlen(strI)) 
return 'F'; 

for(ii = 0; ii < N; ii++) 
{ 

if(!((strl[ii] = str2[ii]) || 
(strl[ii] = eql && str2[ii] == eq2) || 
(strl [ii] = eq2 && str2[ii] == eql))) 

{ 

match = 'F'; 
break; 

} 

} 

return match; 

} 

FIG. 16M 
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1 5' bead alone 

2 — 5' bead/2 pM 5' c' 

3 — 5' bead/2 pM 60mer DNA 

4 = 5' bead/5 ^^M 60mer RNA trans. 

5 5' bead/1 p.M OOmer RMA trans. 

6 5* bead/20 pM Non-specific 

F!G. 13A 
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1 mmm 3' bead alone 
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4 3' bead/ 5 SOmer RNA trar.s. 

5 =• 3' bsad/ 1 phA 60msr RNA trans. 

6 — — 3' bead/ 20 pM Non-specific 



SUBSTITUTE SHEET (RULE 26) 



wo 98/26098 



33/34 



PCT/DS97«3173 



200 




FL1-H 

Key Name 

Mid bead alone 

Mid bead/2 60mer DNA 

Mid bead/5 60mer RNA trans. 

Mid bead/1 pM BOrr.er RNA trans. 
Wild bead/20 iM ^'on°speci?iG 

FIG. 18C 



SUBSTITUTE SHEET (RULE 26) 




wo 98^6098 PCT/US97«3173 

34/34 



200 



160- 



c 

o 
o 



120- 




1 I 1 1 fti 

102 103 
FL1-H 



1 1 1 i n 



104 



Key 



Name 



1 

2 

3 



NS bead/2 pM 60mer DNA 

NS bead/5 pM 60mer RNA trans. 

NS bead/1 pM 60mer RNA trans. 

FIG. 18D 



SUBSTITUTE SHEET (RULE 26) 



INTERNATIONAL SEARCH REPORT 



Inter mat Application No 

PCT/US 97/23173 



A. CLASSinCATlON OF SUBJECT MATTER 

IPC 6 C12Q1/68 




Acoord:ng to Intsmational Patent Ckissificatton(IPC) or to both national classification and IPC 




B. FIELDS SEARCHED 


MlrJmuTi (fociinrtentaiScn searchsd (c!acGiJ;catajn cystsm fo"o'A'ed by clacGJicahon symbols) 

IPC 6 C12Q 


0ocuTr3nt£!»n esart^.od clhcr than rrjiinuujndoaimemauon ta the extent thai such documama ara (nciuCied In the Celdo searched 


Electronic data base conaulted durlrig the International search (nam© o1 data base and, whero practxal, search ternris used) 


C. DOCUMENTS CONSIDERED TO BE RELEVANT 


Catogory ' 


Citation ot document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


WO 95 35505 A (UNIV LELAND STANFORD 
JUNIOR) 28 December 1995 
see page 8, line 25 - page 9, line 20; 
claims 18'20; examples 


1-18 


A 


WO 91 14768 A (GENETYPE AG) 3 October 1991 




A 


WO 93 18068 A (TANOX BIOSYSTEMS INC ; CHANG 
TSE WEN (US)) 16 September 1993 




A 


PATANJAlI S R ET AL: "CONSTRUCTION OF A 
UNIFORM-ABUNDANCE (NORMALIZED) CDNA 
LIBRARY" 

PROCEEDINGS OF THE NATIONAL ACADEMY OF 

SCIENCES OF USA, 

vol. 88, no. 5, 1 March 1991, 

pages 1943-1947, XP000368687 

see the whole document 

-/-- 


30 


^ Further documents are listed in the continuation ol box C. j Patent family memtMrs ore listed in annex. 



* special categories of cited documents : 

"A* document defining the general state of the art which Is not 
considered to be of particular retevanoo 

filing d*t9 

V docurr;rt v/hich may throw dcuits en phOiSy taaimiot or 

citation or other special reason (as specified) 
X)" doaimsnt referring to an oral disclosure, use, exhibition or 
other means 

*P" dcciiTT.crt p'_S:!:chcd prior to tho IritSTnatlor^o; fwliiy data bui 
later than the prtorfty date dalnrted 



T later docunrwnt published after the intemationai filing date 
or priority date and rrat In conftlct with the application but 
cited to understarul the princ^te or theory undsrtying the 
invantion 

"X" docurrent of particular rslovcnce: ttie cJ^Jr.ief! i^wftlior; 
conrvs; 'oq Uit i^kletea novei or cannot he rons'ctered to 

' ¥ document of panlcular relevanca: the claimed inve niton 

canrtot be considered to tnvolv© en tnventivo cJcp v.'hcn tha 
document is combined with one or more ott^r ouch docu- 
ments, such combinaticn being cbvtouo to a parson ck;;;ed 
in tho crt, 

^^'~cri rr~~-k^/ i** i.Kuib«;i putoii* lanu// 



28 April 1SS8 



uaie ot maur^ ot tha imomattorvat search report 

08/05/1998 



Name and mailing address of the ISA 

Europ«an Patent Offic*. P.B. 5616 PatenJtlaan2 
NL • 220) HV Ri\s^k 
Tel. (+31-70) 340-2040. Tx. 31 651 epo n). 
Fax (+31-70)340-3016 



AutJ^rized otficer 



MoHna Gal an. E 



Fojm PCTfiSA/210 ($eoond shs«t} (Jiiy 1992) 



page 1 of 2 



INTERNA' 


nONAL SEARCH REPORT 




Inter md Application No 


.ttormatlon on patent family memt^rs 






PCT/US 97/23173 


Paten! document 


Publication 




Patent family 




PubScatron 


c.ted tn search report 


data 




member(8) 






date 


UO 9535505 A 


28-12-95 


A 1 1 

AU 


2862995 A 


15-01-96 






A 

CA 


2192095 


A 


OA 1 O At 

28-12-95 






EP 


0804731 


A 


AC 1 1 AT 

05-11-97 






JP 


10503841 


T 


AT t\ A A A 

07-04-98 


WO 9114768 A 


03-10-91 


US 


5153117 


A 


06-10-92 






AU 


649027 


B 


12-05-94 






All 

AU 


7471591 


A 


01 — 1 A Al 

^:1-10-91 






OA 


2059554 


A 


OQ AO A1 

^5-U9-9 1 






EP 


0521909 


A 


13-01-93 






IL 


97577 


A 


23-07-96 






US 


5447842 


A 


05-09-95 


WO 9318068 A 


1 C Art rtO 

16-09-93 


US 


5213960 


A 


or AC AO 

25-05-93 






US 


5256542 


A 


OiC 1 A AO 

25-10-93 






US 


5326596 


A 


05-07-94 


WO 9302214 A 


/\ A AO AO 

04-02-93 


NONE 








WO 9508647 A 


OA AO AC 

30-03-95 


US 


5482845 


A 


AA A1 AC 

09-01-96 






AU 


7842594 


A 


1 A f\A AC 

10-04-95 






US 


5637685 


A 


1 A f\C AT 

10-06-97 


WO 9612014 A 


or" f\ A f\c 

25-04-96 


US 


5604097 


A 


18-02-97 






AU 


3946195 


A 


06-05-96 






AU 


4277896 


A 


06-05-96 






CZ 


9700866 


A 


i T AA AT 

17-09-97 






EP 


0786014 


A 


OA AT AT 

30-07-97 






EP 


0793718 


A 


10-09-97 






FI 


971473 


A 


04-05-97 






NO 


971644 


A 


02-06-97 






WO 


9612039 


A 


25-04-96 






US 


5695934 


A 


09-12-97 






US 


5635400 


A 


03-06-97 






US 


5654413 


A 


05-08-97 





I 
I 



Form PCTi1SA/210 (paunl tamly trmax) (Ju:y 1692) 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this dociiment are accurate representations of the original 
documents submitted by the applicant. 
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□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SffiES 

□ FADED TEXT OR DRAWING 
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□ SKEWED/SLANTED IMAGES 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 
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As rescanning these docnments will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



